Overview

Dataset statistics

Number of variables41
Number of observations59400
Missing cells46094
Missing cells (%)1.9%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory19.0 MiB
Average record size in memory336.0 B

Variable types

Numeric10
DateTime1
Categorical28
Boolean2

Warnings

recorded_by has constant value "GeoData Consultants Ltd" Constant
funder has a high cardinality: 1897 distinct values High cardinality
installer has a high cardinality: 2145 distinct values High cardinality
wpt_name has a high cardinality: 37400 distinct values High cardinality
subvillage has a high cardinality: 19287 distinct values High cardinality
lga has a high cardinality: 125 distinct values High cardinality
ward has a high cardinality: 2092 distinct values High cardinality
scheme_name has a high cardinality: 2696 distinct values High cardinality
public_meeting is highly correlated with recorded_byHigh correlation
payment_type is highly correlated with recorded_by and 1 other fieldsHigh correlation
recorded_by is highly correlated with public_meeting and 21 other fieldsHigh correlation
quality_group is highly correlated with recorded_by and 1 other fieldsHigh correlation
source_class is highly correlated with recorded_by and 2 other fieldsHigh correlation
water_quality is highly correlated with recorded_by and 1 other fieldsHigh correlation
management_group is highly correlated with recorded_by and 1 other fieldsHigh correlation
waterpoint_type_group is highly correlated with recorded_by and 1 other fieldsHigh correlation
source is highly correlated with recorded_by and 2 other fieldsHigh correlation
permit is highly correlated with recorded_byHigh correlation
extraction_type is highly correlated with recorded_by and 2 other fieldsHigh correlation
basin is highly correlated with recorded_byHigh correlation
quantity is highly correlated with recorded_by and 1 other fieldsHigh correlation
scheme_management is highly correlated with recorded_byHigh correlation
waterpoint_type is highly correlated with recorded_by and 1 other fieldsHigh correlation
status_group is highly correlated with recorded_byHigh correlation
extraction_type_group is highly correlated with recorded_by and 2 other fieldsHigh correlation
payment is highly correlated with payment_type and 1 other fieldsHigh correlation
management is highly correlated with recorded_by and 1 other fieldsHigh correlation
region is highly correlated with recorded_byHigh correlation
quantity_group is highly correlated with recorded_by and 1 other fieldsHigh correlation
extraction_type_class is highly correlated with recorded_by and 2 other fieldsHigh correlation
source_type is highly correlated with recorded_by and 2 other fieldsHigh correlation
funder has 3635 (6.1%) missing values Missing
installer has 3655 (6.2%) missing values Missing
public_meeting has 3334 (5.6%) missing values Missing
scheme_management has 3877 (6.5%) missing values Missing
scheme_name has 28166 (47.4%) missing values Missing
permit has 3056 (5.1%) missing values Missing
amount_tsh is highly skewed (γ1 = 57.80779995) Skewed
num_private is highly skewed (γ1 = 91.93374999) Skewed
id is uniformly distributed Uniform
id has unique values Unique
amount_tsh has 41639 (70.1%) zeros Zeros
gps_height has 20438 (34.4%) zeros Zeros
longitude has 1812 (3.1%) zeros Zeros
num_private has 58643 (98.7%) zeros Zeros
population has 21381 (36.0%) zeros Zeros
construction_year has 20709 (34.9%) zeros Zeros

Reproduction

Analysis started2021-04-14 13:51:05.320956
Analysis finished2021-04-14 13:52:09.009990
Duration1 minute and 3.69 seconds
Software versionpandas-profiling v2.11.0
Download configurationconfig.yaml

Variables

id
Real number (ℝ≥0)

UNIFORM
UNIQUE

Distinct59400
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean37115.13177
Minimum0
Maximum74247
Zeros1
Zeros (%)< 0.1%
Memory size928.1 KiB
2021-04-14T09:52:09.226124image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile3730.9
Q118519.75
median37061.5
Q355656.5
95-th percentile70564.05
Maximum74247
Range74247
Interquartile range (IQR)37136.75

Descriptive statistics

Standard deviation21453.12837
Coefficient of variation (CV)0.5780156866
Kurtosis-1.201515029
Mean37115.13177
Median Absolute Deviation (MAD)18568.5
Skewness0.00262253035
Sum2204638827
Variance460236716.9
MonotocityNot monotonic
2021-04-14T09:52:09.642915image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01
 
< 0.1%
198111
 
< 0.1%
382001
 
< 0.1%
341061
 
< 0.1%
361551
 
< 0.1%
463961
 
< 0.1%
484451
 
< 0.1%
423021
 
< 0.1%
709841
 
< 0.1%
730331
 
< 0.1%
Other values (59390)59390
> 99.9%
ValueCountFrequency (%)
01
< 0.1%
11
< 0.1%
21
< 0.1%
31
< 0.1%
41
< 0.1%
ValueCountFrequency (%)
742471
< 0.1%
742461
< 0.1%
742431
< 0.1%
742421
< 0.1%
742401
< 0.1%

amount_tsh
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct98
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean317.6503847
Minimum0
Maximum350000
Zeros41639
Zeros (%)70.1%
Memory size928.1 KiB
2021-04-14T09:52:09.874753image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q320
95-th percentile1200
Maximum350000
Range350000
Interquartile range (IQR)20

Descriptive statistics

Standard deviation2997.574558
Coefficient of variation (CV)9.436709989
Kurtosis4903.543102
Mean317.6503847
Median Absolute Deviation (MAD)0
Skewness57.80779995
Sum18868432.85
Variance8985453.232
MonotocityNot monotonic
2021-04-14T09:52:10.074883image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
041639
70.1%
5003102
 
5.2%
502472
 
4.2%
10001488
 
2.5%
201463
 
2.5%
2001220
 
2.1%
100816
 
1.4%
10806
 
1.4%
30743
 
1.3%
2000704
 
1.2%
Other values (88)4947
 
8.3%
ValueCountFrequency (%)
041639
70.1%
0.23
 
< 0.1%
0.251
 
< 0.1%
13
 
< 0.1%
213
 
< 0.1%
ValueCountFrequency (%)
3500001
< 0.1%
2500001
< 0.1%
2000001
< 0.1%
1700001
< 0.1%
1380001
< 0.1%
Distinct356
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size928.1 KiB
Minimum2002-10-14 00:00:00
Maximum2013-12-03 00:00:00
2021-04-14T09:52:10.286659image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:52:10.488203image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

funder
Categorical

HIGH CARDINALITY
MISSING

Distinct1897
Distinct (%)3.4%
Missing3635
Missing (%)6.1%
Memory size928.1 KiB
Government Of Tanzania
9084 
Danida
 
3114
Hesawa
 
2202
Rwssp
 
1374
World Bank
 
1349
Other values (1892)
38642 

Length

Max length30
Median length6
Mean length9.929902268
Min length1

Characters and Unicode

Total characters553741
Distinct characters69
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique974 ?
Unique (%)1.7%

Sample

1st rowRoman
2nd rowGrumeti
3rd rowLottery Club
4th rowUnicef
5th rowAction In A
ValueCountFrequency (%)
Government Of Tanzania9084
 
15.3%
Danida3114
 
5.2%
Hesawa2202
 
3.7%
Rwssp1374
 
2.3%
World Bank1349
 
2.3%
Kkkt1287
 
2.2%
World Vision1246
 
2.1%
Unicef1057
 
1.8%
Tasaf877
 
1.5%
District Council843
 
1.4%
Other values (1887)33332
56.1%
(Missing)3635
 
6.1%
2021-04-14T09:52:10.933703image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
of9748
 
10.8%
government9276
 
10.3%
tanzania9172
 
10.1%
danida3123
 
3.5%
world2789
 
3.1%
water2645
 
2.9%
hesawa2203
 
2.4%
bank1416
 
1.6%
rwssp1376
 
1.5%
kkkt1370
 
1.5%
Other values (2065)47254
52.3%

Most occurring characters

ValueCountFrequency (%)
a68200
 
12.3%
n57842
 
10.4%
i38011
 
6.9%
e37464
 
6.8%
34673
 
6.3%
r27879
 
5.0%
t23016
 
4.2%
o22741
 
4.1%
s17208
 
3.1%
d15464
 
2.8%
Other values (59)211243
38.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter425880
76.9%
Uppercase Letter89705
 
16.2%
Space Separator34673
 
6.3%
Other Punctuation1322
 
0.2%
Decimal Number803
 
0.1%
Open Punctuation437
 
0.1%
Close Punctuation431
 
0.1%
Dash Punctuation323
 
0.1%
Connector Punctuation167
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
T12110
13.5%
G10722
12.0%
O10613
11.8%
D7928
 
8.8%
W7352
 
8.2%
C4679
 
5.2%
R4454
 
5.0%
H3462
 
3.9%
M3135
 
3.5%
K2962
 
3.3%
Other values (16)22288
24.8%
ValueCountFrequency (%)
a68200
16.0%
n57842
13.6%
i38011
 
8.9%
e37464
 
8.8%
r27879
 
6.5%
t23016
 
5.4%
o22741
 
5.3%
s17208
 
4.0%
d15464
 
3.6%
f15329
 
3.6%
Other values (16)102726
24.1%
ValueCountFrequency (%)
/783
59.2%
.469
35.5%
\33
 
2.5%
&26
 
2.0%
'11
 
0.8%
ValueCountFrequency (%)
0793
98.8%
25
 
0.6%
12
 
0.2%
92
 
0.2%
41
 
0.1%
ValueCountFrequency (%)
(434
99.3%
[3
 
0.7%
ValueCountFrequency (%)
)429
99.5%
]2
 
0.5%
ValueCountFrequency (%)
34673
100.0%
ValueCountFrequency (%)
_167
100.0%
ValueCountFrequency (%)
-323
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin515585
93.1%
Common38156
 
6.9%

Most frequent character per script

ValueCountFrequency (%)
a68200
 
13.2%
n57842
 
11.2%
i38011
 
7.4%
e37464
 
7.3%
r27879
 
5.4%
t23016
 
4.5%
o22741
 
4.4%
s17208
 
3.3%
d15464
 
3.0%
f15329
 
3.0%
Other values (42)192431
37.3%
ValueCountFrequency (%)
34673
90.9%
0793
 
2.1%
/783
 
2.1%
.469
 
1.2%
(434
 
1.1%
)429
 
1.1%
-323
 
0.8%
_167
 
0.4%
\33
 
0.1%
&26
 
0.1%
Other values (7)26
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII553741
100.0%

Most frequent character per block

ValueCountFrequency (%)
a68200
 
12.3%
n57842
 
10.4%
i38011
 
6.9%
e37464
 
6.8%
34673
 
6.3%
r27879
 
5.0%
t23016
 
4.2%
o22741
 
4.1%
s17208
 
3.1%
d15464
 
2.8%
Other values (59)211243
38.1%

gps_height
Real number (ℝ)

ZEROS

Distinct2428
Distinct (%)4.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean668.2972391
Minimum-90
Maximum2770
Zeros20438
Zeros (%)34.4%
Memory size928.1 KiB
2021-04-14T09:52:11.134285image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum-90
5-th percentile0
Q10
median369
Q31319.25
95-th percentile1797
Maximum2770
Range2860
Interquartile range (IQR)1319.25

Descriptive statistics

Standard deviation693.1163503
Coefficient of variation (CV)1.037137833
Kurtosis-1.292440135
Mean668.2972391
Median Absolute Deviation (MAD)369
Skewness0.462402085
Sum39696856
Variance480410.2751
MonotocityNot monotonic
2021-04-14T09:52:11.319150image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
020438
34.4%
-1560
 
0.1%
-1655
 
0.1%
-1355
 
0.1%
129052
 
0.1%
-2052
 
0.1%
-1451
 
0.1%
30351
 
0.1%
-1849
 
0.1%
-1947
 
0.1%
Other values (2418)38490
64.8%
ValueCountFrequency (%)
-901
< 0.1%
-632
< 0.1%
-591
< 0.1%
-571
< 0.1%
-551
< 0.1%
ValueCountFrequency (%)
27701
< 0.1%
26281
< 0.1%
26271
< 0.1%
26262
< 0.1%
26231
< 0.1%

installer
Categorical

HIGH CARDINALITY
MISSING

Distinct2145
Distinct (%)3.8%
Missing3655
Missing (%)6.2%
Memory size928.1 KiB
DWE
17402 
Government
 
1825
RWE
 
1206
Commu
 
1060
DANIDA
 
1050
Other values (2140)
33202 

Length

Max length30
Median length4
Mean length6.111202798
Min length1

Characters and Unicode

Total characters340669
Distinct characters70
Distinct categories10 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1098 ?
Unique (%)2.0%

Sample

1st rowRoman
2nd rowGRUMETI
3rd rowWorld vision
4th rowUNICEF
5th rowArtisan
ValueCountFrequency (%)
DWE17402
29.3%
Government1825
 
3.1%
RWE1206
 
2.0%
Commu1060
 
1.8%
DANIDA1050
 
1.8%
KKKT898
 
1.5%
Hesawa840
 
1.4%
0777
 
1.3%
TCRS707
 
1.2%
Central government622
 
1.0%
Other values (2135)29358
49.4%
(Missing)3655
 
6.2%
2021-04-14T09:52:11.783811image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
dwe17601
25.8%
government2778
 
4.1%
water1881
 
2.8%
hesawa1395
 
2.0%
rwe1230
 
1.8%
district1216
 
1.8%
kkkt1153
 
1.7%
council1106
 
1.6%
commu1065
 
1.6%
danida1051
 
1.5%
Other values (1976)37806
55.4%

Most occurring characters

ValueCountFrequency (%)
D27595
 
8.1%
W25849
 
7.6%
E25389
 
7.5%
a17343
 
5.1%
n16558
 
4.9%
e15500
 
4.5%
i15053
 
4.4%
A13668
 
4.0%
r13377
 
3.9%
t12904
 
3.8%
Other values (60)157433
46.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter167438
49.1%
Lowercase Letter158190
46.4%
Space Separator12673
 
3.7%
Other Punctuation971
 
0.3%
Decimal Number783
 
0.2%
Dash Punctuation268
 
0.1%
Connector Punctuation169
 
< 0.1%
Open Punctuation159
 
< 0.1%
Close Punctuation16
 
< 0.1%
Currency Symbol2
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
D27595
16.5%
W25849
15.4%
E25389
15.2%
A13668
8.2%
C10535
 
6.3%
S6659
 
4.0%
R6518
 
3.9%
I6160
 
3.7%
T5948
 
3.6%
K5390
 
3.2%
Other values (16)33727
20.1%
ValueCountFrequency (%)
a17343
11.0%
n16558
10.5%
e15500
9.8%
i15053
9.5%
r13377
8.5%
t12904
 
8.2%
o12398
 
7.8%
m9289
 
5.9%
l6201
 
3.9%
s6173
 
3.9%
Other values (16)33394
21.1%
ValueCountFrequency (%)
/670
69.0%
.238
 
24.5%
&50
 
5.1%
'12
 
1.2%
#1
 
0.1%
ValueCountFrequency (%)
0780
99.6%
11
 
0.1%
41
 
0.1%
91
 
0.1%
ValueCountFrequency (%)
}13
81.2%
]2
 
12.5%
)1
 
6.2%
ValueCountFrequency (%)
(157
98.7%
[2
 
1.3%
ValueCountFrequency (%)
12673
100.0%
ValueCountFrequency (%)
_169
100.0%
ValueCountFrequency (%)
-268
100.0%
ValueCountFrequency (%)
$2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin325628
95.6%
Common15041
 
4.4%

Most frequent character per script

ValueCountFrequency (%)
D27595
 
8.5%
W25849
 
7.9%
E25389
 
7.8%
a17343
 
5.3%
n16558
 
5.1%
e15500
 
4.8%
i15053
 
4.6%
A13668
 
4.2%
r13377
 
4.1%
t12904
 
4.0%
Other values (42)142392
43.7%
ValueCountFrequency (%)
12673
84.3%
0780
 
5.2%
/670
 
4.5%
-268
 
1.8%
.238
 
1.6%
_169
 
1.1%
(157
 
1.0%
&50
 
0.3%
}13
 
0.1%
'12
 
0.1%
Other values (8)11
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII340669
100.0%

Most frequent character per block

ValueCountFrequency (%)
D27595
 
8.1%
W25849
 
7.6%
E25389
 
7.5%
a17343
 
5.1%
n16558
 
4.9%
e15500
 
4.5%
i15053
 
4.4%
A13668
 
4.0%
r13377
 
3.9%
t12904
 
3.8%
Other values (60)157433
46.2%

longitude
Real number (ℝ≥0)

ZEROS

Distinct57516
Distinct (%)96.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean34.07742669
Minimum0
Maximum40.34519307
Zeros1812
Zeros (%)3.1%
Memory size928.1 KiB
2021-04-14T09:52:12.015672image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile30.04066001
Q133.09034738
median34.90874343
Q337.17838657
95-th percentile39.13323954
Maximum40.34519307
Range40.34519307
Interquartile range (IQR)4.08803919

Descriptive statistics

Standard deviation6.567431846
Coefficient of variation (CV)0.1927208854
Kurtosis19.18703105
Mean34.07742669
Median Absolute Deviation (MAD)2.032511095
Skewness-4.191046455
Sum2024199.146
Variance43.13116105
MonotocityNot monotonic
2021-04-14T09:52:12.217407image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01812
 
3.1%
32.97719062
 
< 0.1%
32.919861392
 
< 0.1%
37.542784972
 
< 0.1%
39.105306612
 
< 0.1%
32.984789632
 
< 0.1%
39.103751982
 
< 0.1%
37.541579172
 
< 0.1%
37.281356972
 
< 0.1%
37.328905222
 
< 0.1%
Other values (57506)57570
96.9%
ValueCountFrequency (%)
01812
3.1%
29.60712191
 
< 0.1%
29.607201091
 
< 0.1%
29.610320561
 
< 0.1%
29.610964821
 
< 0.1%
ValueCountFrequency (%)
40.345193071
< 0.1%
40.344300891
< 0.1%
40.325239961
< 0.1%
40.325226431
< 0.1%
40.323401811
< 0.1%

latitude
Real number (ℝ)

Distinct57517
Distinct (%)96.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-5.70603266
Minimum-11.64944018
Maximum-2 × 108
Zeros0
Zeros (%)0.0%
Memory size928.1 KiB
2021-04-14T09:52:12.459306image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum-11.64944018
5-th percentile-10.58554992
Q1-8.540621305
median-5.02159665
Q3-3.32615564
95-th percentile-1.408872227
Maximum-2 × 108
Range11.64944016
Interquartile range (IQR)5.214465665

Descriptive statistics

Standard deviation2.946019081
Coefficient of variation (CV)-0.5162990219
Kurtosis-1.057616666
Mean-5.70603266
Median Absolute Deviation (MAD)2.07002988
Skewness-0.1520365709
Sum-338938.34
Variance8.679028427
MonotocityNot monotonic
2021-04-14T09:52:12.661093image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-2 × 1081812
 
3.1%
-2.494545592
 
< 0.1%
-6.983182632
 
< 0.1%
-7.056922532
 
< 0.1%
-7.056372352
 
< 0.1%
-2.487084612
 
< 0.1%
-6.981884192
 
< 0.1%
-6.978262942
 
< 0.1%
-7.065372642
 
< 0.1%
-6.991294112
 
< 0.1%
Other values (57507)57570
96.9%
ValueCountFrequency (%)
-11.649440181
< 0.1%
-11.648377591
< 0.1%
-11.586296561
< 0.1%
-11.568576791
< 0.1%
-11.566804571
< 0.1%
ValueCountFrequency (%)
-2 × 1081812
3.1%
-0.998464351
 
< 0.1%
-0.9989161
 
< 0.1%
-0.999012091
 
< 0.1%
-0.999117021
 
< 0.1%

wpt_name
Categorical

HIGH CARDINALITY

Distinct37400
Distinct (%)63.0%
Missing0
Missing (%)0.0%
Memory size928.1 KiB
none
 
3563
Shuleni
 
1748
Zahanati
 
830
Msikitini
 
535
Kanisani
 
323
Other values (37395)
52401 

Length

Max length30
Median length10
Mean length10.96210438
Min length1

Characters and Unicode

Total characters651149
Distinct characters75
Distinct categories10 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique32928 ?
Unique (%)55.4%

Sample

1st rownone
2nd rowZahanati
3rd rowKwa Mahundi
4th rowZahanati Ya Nanyumbu
5th rowShuleni
ValueCountFrequency (%)
none3563
 
6.0%
Shuleni1748
 
2.9%
Zahanati830
 
1.4%
Msikitini535
 
0.9%
Kanisani323
 
0.5%
Bombani271
 
0.5%
Sokoni260
 
0.4%
Ofisini254
 
0.4%
School208
 
0.4%
Shule Ya Msingi199
 
0.3%
Other values (37390)51209
86.2%
2021-04-14T09:52:13.234716image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
kwa21384
 
19.6%
none3565
 
3.3%
mzee3385
 
3.1%
shuleni2123
 
1.9%
ya1499
 
1.4%
shule1389
 
1.3%
school1113
 
1.0%
primary1052
 
1.0%
zahanati983
 
0.9%
msingi870
 
0.8%
Other values (29461)71931
65.8%

Most occurring characters

ValueCountFrequency (%)
a98806
15.2%
i52404
 
8.0%
49898
 
7.7%
n42148
 
6.5%
e40985
 
6.3%
w31669
 
4.9%
K31385
 
4.8%
o30247
 
4.6%
u24217
 
3.7%
M22040
 
3.4%
Other values (65)227350
34.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter493422
75.8%
Uppercase Letter105185
 
16.2%
Space Separator49898
 
7.7%
Decimal Number1680
 
0.3%
Other Punctuation741
 
0.1%
Dash Punctuation104
 
< 0.1%
Open Punctuation37
 
< 0.1%
Close Punctuation37
 
< 0.1%
Connector Punctuation24
 
< 0.1%
Modifier Symbol21
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
a98806
20.0%
i52404
10.6%
n42148
 
8.5%
e40985
 
8.3%
w31669
 
6.4%
o30247
 
6.1%
u24217
 
4.9%
l20954
 
4.2%
m17631
 
3.6%
h17215
 
3.5%
Other values (16)117146
23.7%
ValueCountFrequency (%)
K31385
29.8%
M22040
21.0%
S10752
 
10.2%
N4880
 
4.6%
A3497
 
3.3%
B3425
 
3.3%
C2791
 
2.7%
P2564
 
2.4%
L2507
 
2.4%
J2385
 
2.3%
Other values (16)18959
18.0%
ValueCountFrequency (%)
1507
30.2%
2439
26.1%
3152
 
9.0%
4120
 
7.1%
7106
 
6.3%
586
 
5.1%
680
 
4.8%
875
 
4.5%
970
 
4.2%
045
 
2.7%
ValueCountFrequency (%)
'417
56.3%
.175
23.6%
/146
 
19.7%
&2
 
0.3%
\1
 
0.1%
ValueCountFrequency (%)
(29
78.4%
[8
 
21.6%
ValueCountFrequency (%)
)29
78.4%
]8
 
21.6%
ValueCountFrequency (%)
49898
100.0%
ValueCountFrequency (%)
-104
100.0%
ValueCountFrequency (%)
_24
100.0%
ValueCountFrequency (%)
`21
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin598607
91.9%
Common52542
 
8.1%

Most frequent character per script

ValueCountFrequency (%)
a98806
16.5%
i52404
 
8.8%
n42148
 
7.0%
e40985
 
6.8%
w31669
 
5.3%
K31385
 
5.2%
o30247
 
5.1%
u24217
 
4.0%
M22040
 
3.7%
l20954
 
3.5%
Other values (42)203752
34.0%
ValueCountFrequency (%)
49898
95.0%
1507
 
1.0%
2439
 
0.8%
'417
 
0.8%
.175
 
0.3%
3152
 
0.3%
/146
 
0.3%
4120
 
0.2%
7106
 
0.2%
-104
 
0.2%
Other values (13)478
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII651149
100.0%

Most frequent character per block

ValueCountFrequency (%)
a98806
15.2%
i52404
 
8.0%
49898
 
7.7%
n42148
 
6.5%
e40985
 
6.3%
w31669
 
4.9%
K31385
 
4.8%
o30247
 
4.6%
u24217
 
3.7%
M22040
 
3.4%
Other values (65)227350
34.9%

num_private
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct65
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.4741414141
Minimum0
Maximum1776
Zeros58643
Zeros (%)98.7%
Memory size928.1 KiB
2021-04-14T09:52:13.444345image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum1776
Range1776
Interquartile range (IQR)0

Descriptive statistics

Standard deviation12.23622981
Coefficient of variation (CV)25.80713147
Kurtosis11137.29521
Mean0.4741414141
Median Absolute Deviation (MAD)0
Skewness91.93374999
Sum28164
Variance149.72532
MonotocityNot monotonic
2021-04-14T09:52:13.638004image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
058643
98.7%
681
 
0.1%
173
 
0.1%
846
 
0.1%
546
 
0.1%
3240
 
0.1%
4536
 
0.1%
1535
 
0.1%
3930
 
0.1%
9328
 
< 0.1%
Other values (55)342
 
0.6%
ValueCountFrequency (%)
058643
98.7%
173
 
0.1%
223
 
< 0.1%
327
 
< 0.1%
420
 
< 0.1%
ValueCountFrequency (%)
17761
< 0.1%
14021
< 0.1%
7551
< 0.1%
6981
< 0.1%
6721
< 0.1%

basin
Categorical

HIGH CORRELATION

Distinct9
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size928.1 KiB
Lake Victoria
10248 
Pangani
8940 
Rufiji
7976 
Internal
7785 
Lake Tanganyika
6432 
Other values (4)
18019 

Length

Max length23
Median length10
Mean length10.8923569
Min length6

Characters and Unicode

Total characters647006
Distinct characters32
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowLake Nyasa
2nd rowLake Victoria
3rd rowPangani
4th rowRuvuma / Southern Coast
5th rowLake Victoria
ValueCountFrequency (%)
Lake Victoria10248
17.3%
Pangani8940
15.1%
Rufiji7976
13.4%
Internal7785
13.1%
Lake Tanganyika6432
10.8%
Wami / Ruvu5987
10.1%
Lake Nyasa5085
8.6%
Ruvuma / Southern Coast4493
7.6%
Lake Rukwa2454
 
4.1%
2021-04-14T09:52:14.021291image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-04-14T09:52:14.152261image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
lake24219
22.2%
10480
9.6%
victoria10248
9.4%
pangani8940
 
8.2%
rufiji7976
 
7.3%
internal7785
 
7.1%
tanganyika6432
 
5.9%
wami5987
 
5.5%
ruvu5987
 
5.5%
nyasa5085
 
4.7%
Other values (4)15933
14.6%

Most occurring characters

ValueCountFrequency (%)
a107025
16.5%
i57807
 
8.9%
n50807
 
7.9%
49672
 
7.7%
e36497
 
5.6%
u35883
 
5.5%
k33105
 
5.1%
t27019
 
4.2%
L24219
 
3.7%
r22526
 
3.5%
Other values (22)202446
31.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter488262
75.5%
Uppercase Letter98592
 
15.2%
Space Separator49672
 
7.7%
Other Punctuation10480
 
1.6%

Most frequent character per category

ValueCountFrequency (%)
a107025
21.9%
i57807
11.8%
n50807
10.4%
e36497
 
7.5%
u35883
 
7.3%
k33105
 
6.8%
t27019
 
5.5%
r22526
 
4.6%
o19234
 
3.9%
g15372
 
3.1%
Other values (10)82987
17.0%
ValueCountFrequency (%)
L24219
24.6%
R20910
21.2%
V10248
10.4%
P8940
 
9.1%
I7785
 
7.9%
T6432
 
6.5%
W5987
 
6.1%
N5085
 
5.2%
S4493
 
4.6%
C4493
 
4.6%
ValueCountFrequency (%)
49672
100.0%
ValueCountFrequency (%)
/10480
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin586854
90.7%
Common60152
 
9.3%

Most frequent character per script

ValueCountFrequency (%)
a107025
18.2%
i57807
 
9.9%
n50807
 
8.7%
e36497
 
6.2%
u35883
 
6.1%
k33105
 
5.6%
t27019
 
4.6%
L24219
 
4.1%
r22526
 
3.8%
R20910
 
3.6%
Other values (20)171056
29.1%
ValueCountFrequency (%)
49672
82.6%
/10480
 
17.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII647006
100.0%

Most frequent character per block

ValueCountFrequency (%)
a107025
16.5%
i57807
 
8.9%
n50807
 
7.9%
49672
 
7.7%
e36497
 
5.6%
u35883
 
5.5%
k33105
 
5.1%
t27019
 
4.2%
L24219
 
3.7%
r22526
 
3.5%
Other values (22)202446
31.3%

subvillage
Categorical

HIGH CARDINALITY

Distinct19287
Distinct (%)32.7%
Missing371
Missing (%)0.6%
Memory size928.1 KiB
Madukani
 
508
Shuleni
 
506
Majengo
 
502
Kati
 
373
Mtakuja
 
262
Other values (19282)
56878 

Length

Max length30
Median length7
Mean length7.897592709
Min length1

Characters and Unicode

Total characters466187
Distinct characters73
Distinct categories10 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9424 ?
Unique (%)16.0%

Sample

1st rowMnyusi B
2nd rowNyamara
3rd rowMajengo
4th rowMahakamani
5th rowKyanyamisa
ValueCountFrequency (%)
Madukani508
 
0.9%
Shuleni506
 
0.9%
Majengo502
 
0.8%
Kati373
 
0.6%
Mtakuja262
 
0.4%
Sokoni232
 
0.4%
M187
 
0.3%
Muungano172
 
0.3%
Mbuyuni164
 
0.3%
Mlimani152
 
0.3%
Other values (19277)55971
94.2%
(Missing)371
 
0.6%
2021-04-14T09:52:14.646562image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
a2387
 
3.4%
b2043
 
2.9%
kati1902
 
2.7%
majengo610
 
0.9%
wa600
 
0.8%
shuleni593
 
0.8%
madukani569
 
0.8%
mtaa514
 
0.7%
juu403
 
0.6%
mjini378
 
0.5%
Other values (17024)60795
85.9%

Most occurring characters

ValueCountFrequency (%)
a72003
15.4%
i45666
 
9.8%
n33499
 
7.2%
u26424
 
5.7%
e25671
 
5.5%
o23556
 
5.1%
M20431
 
4.4%
g18951
 
4.1%
l16372
 
3.5%
m15053
 
3.2%
Other values (63)168561
36.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter381263
81.8%
Uppercase Letter71291
 
15.3%
Space Separator11766
 
2.5%
Other Punctuation1184
 
0.3%
Decimal Number589
 
0.1%
Modifier Symbol45
 
< 0.1%
Dash Punctuation36
 
< 0.1%
Open Punctuation5
 
< 0.1%
Close Punctuation5
 
< 0.1%
Connector Punctuation3
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
a72003
18.9%
i45666
12.0%
n33499
 
8.8%
u26424
 
6.9%
e25671
 
6.7%
o23556
 
6.2%
g18951
 
5.0%
l16372
 
4.3%
m15053
 
3.9%
b11843
 
3.1%
Other values (16)92225
24.2%
ValueCountFrequency (%)
M20431
28.7%
K12545
17.6%
N6068
 
8.5%
B5112
 
7.2%
I4503
 
6.3%
S4039
 
5.7%
A3076
 
4.3%
C2533
 
3.6%
L2458
 
3.4%
U1704
 
2.4%
Other values (15)8822
12.4%
ValueCountFrequency (%)
1242
41.1%
270
 
11.9%
350
 
8.5%
449
 
8.3%
633
 
5.6%
832
 
5.4%
932
 
5.4%
030
 
5.1%
529
 
4.9%
722
 
3.7%
ValueCountFrequency (%)
'1017
85.9%
/136
 
11.5%
.29
 
2.4%
#2
 
0.2%
ValueCountFrequency (%)
(4
80.0%
[1
 
20.0%
ValueCountFrequency (%)
)4
80.0%
]1
 
20.0%
ValueCountFrequency (%)
11766
100.0%
ValueCountFrequency (%)
`45
100.0%
ValueCountFrequency (%)
-36
100.0%
ValueCountFrequency (%)
_3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin452554
97.1%
Common13633
 
2.9%

Most frequent character per script

ValueCountFrequency (%)
a72003
15.9%
i45666
 
10.1%
n33499
 
7.4%
u26424
 
5.8%
e25671
 
5.7%
o23556
 
5.2%
M20431
 
4.5%
g18951
 
4.2%
l16372
 
3.6%
m15053
 
3.3%
Other values (41)154928
34.2%
ValueCountFrequency (%)
11766
86.3%
'1017
 
7.5%
1242
 
1.8%
/136
 
1.0%
270
 
0.5%
350
 
0.4%
449
 
0.4%
`45
 
0.3%
-36
 
0.3%
633
 
0.2%
Other values (12)189
 
1.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII466187
100.0%

Most frequent character per block

ValueCountFrequency (%)
a72003
15.4%
i45666
 
9.8%
n33499
 
7.2%
u26424
 
5.7%
e25671
 
5.5%
o23556
 
5.1%
M20431
 
4.4%
g18951
 
4.1%
l16372
 
3.5%
m15053
 
3.2%
Other values (63)168561
36.2%

region
Categorical

HIGH CORRELATION

Distinct21
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size928.1 KiB
Iringa
5294 
Shinyanga
4982 
Mbeya
4639 
Kilimanjaro
4379 
Morogoro
4006 
Other values (16)
36100 

Length

Max length13
Median length6
Mean length6.623754209
Min length4

Characters and Unicode

Total characters393451
Distinct characters32
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowIringa
2nd rowMara
3rd rowManyara
4th rowMtwara
5th rowKagera
ValueCountFrequency (%)
Iringa5294
 
8.9%
Shinyanga4982
 
8.4%
Mbeya4639
 
7.8%
Kilimanjaro4379
 
7.4%
Morogoro4006
 
6.7%
Arusha3350
 
5.6%
Kagera3316
 
5.6%
Mwanza3102
 
5.2%
Kigoma2816
 
4.7%
Ruvuma2640
 
4.4%
Other values (11)20876
35.1%
2021-04-14T09:52:14.999447image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
iringa5294
 
8.7%
shinyanga4982
 
8.2%
mbeya4639
 
7.6%
kilimanjaro4379
 
7.2%
morogoro4006
 
6.6%
arusha3350
 
5.5%
kagera3316
 
5.4%
mwanza3102
 
5.1%
kigoma2816
 
4.6%
ruvuma2640
 
4.3%
Other values (13)22486
36.9%

Most occurring characters

ValueCountFrequency (%)
a83413
21.2%
n33143
 
8.4%
r32397
 
8.2%
i31763
 
8.1%
o29580
 
7.5%
g25054
 
6.4%
M17029
 
4.3%
m12841
 
3.3%
y11204
 
2.8%
K10511
 
2.7%
Other values (22)106516
27.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter331636
84.3%
Uppercase Letter60205
 
15.3%
Space Separator1610
 
0.4%

Most frequent character per category

ValueCountFrequency (%)
a83413
25.2%
n33143
 
10.0%
r32397
 
9.8%
i31763
 
9.6%
o29580
 
8.9%
g25054
 
7.6%
m12841
 
3.9%
y11204
 
3.4%
u10438
 
3.1%
w9275
 
2.8%
Other values (11)52528
15.8%
ValueCountFrequency (%)
M17029
28.3%
K10511
17.5%
S7880
13.1%
I5294
 
8.8%
T4506
 
7.5%
R4448
 
7.4%
A3350
 
5.6%
D3006
 
5.0%
P2635
 
4.4%
L1546
 
2.6%
ValueCountFrequency (%)
1610
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin391841
99.6%
Common1610
 
0.4%

Most frequent character per script

ValueCountFrequency (%)
a83413
21.3%
n33143
 
8.5%
r32397
 
8.3%
i31763
 
8.1%
o29580
 
7.5%
g25054
 
6.4%
M17029
 
4.3%
m12841
 
3.3%
y11204
 
2.9%
K10511
 
2.7%
Other values (21)104906
26.8%
ValueCountFrequency (%)
1610
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII393451
100.0%

Most frequent character per block

ValueCountFrequency (%)
a83413
21.2%
n33143
 
8.4%
r32397
 
8.2%
i31763
 
8.1%
o29580
 
7.5%
g25054
 
6.4%
M17029
 
4.3%
m12841
 
3.3%
y11204
 
2.8%
K10511
 
2.7%
Other values (22)106516
27.1%

region_code
Real number (ℝ≥0)

Distinct27
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.29700337
Minimum1
Maximum99
Zeros0
Zeros (%)0.0%
Memory size928.1 KiB
2021-04-14T09:52:15.158917image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q15
median12
Q317
95-th percentile60
Maximum99
Range98
Interquartile range (IQR)12

Descriptive statistics

Standard deviation17.58740634
Coefficient of variation (CV)1.149728866
Kurtosis10.28843341
Mean15.29700337
Median Absolute Deviation (MAD)6
Skewness3.17381811
Sum908642
Variance309.3168617
MonotocityNot monotonic
2021-04-14T09:52:15.603363image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=27)
ValueCountFrequency (%)
115300
 
8.9%
175011
 
8.4%
124639
 
7.8%
34379
 
7.4%
54040
 
6.8%
183324
 
5.6%
193047
 
5.1%
23024
 
5.1%
162816
 
4.7%
102640
 
4.4%
Other values (17)21180
35.7%
ValueCountFrequency (%)
12201
3.7%
23024
5.1%
34379
7.4%
42513
4.2%
54040
6.8%
ValueCountFrequency (%)
99423
 
0.7%
90917
1.5%
801238
2.1%
601025
1.7%
401
 
< 0.1%

district_code
Real number (ℝ≥0)

Distinct20
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.629747475
Minimum0
Maximum80
Zeros23
Zeros (%)< 0.1%
Memory size928.1 KiB
2021-04-14T09:52:15.776970image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q12
median3
Q35
95-th percentile30
Maximum80
Range80
Interquartile range (IQR)3

Descriptive statistics

Standard deviation9.633648629
Coefficient of variation (CV)1.711204396
Kurtosis16.21428363
Mean5.629747475
Median Absolute Deviation (MAD)1
Skewness3.962045299
Sum334407
Variance92.80718592
MonotocityNot monotonic
2021-04-14T09:52:15.928207image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=20)
ValueCountFrequency (%)
112203
20.5%
211173
18.8%
39998
16.8%
48999
15.1%
54356
 
7.3%
64074
 
6.9%
73343
 
5.6%
81043
 
1.8%
30995
 
1.7%
33874
 
1.5%
Other values (10)2342
 
3.9%
ValueCountFrequency (%)
023
 
< 0.1%
112203
20.5%
211173
18.8%
39998
16.8%
48999
15.1%
ValueCountFrequency (%)
8012
 
< 0.1%
676
 
< 0.1%
63195
0.3%
62109
0.2%
6063
 
0.1%

lga
Categorical

HIGH CARDINALITY

Distinct125
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size928.1 KiB
Njombe
 
2503
Arusha Rural
 
1252
Moshi Rural
 
1251
Bariadi
 
1177
Rungwe
 
1106
Other values (120)
52111 

Length

Max length16
Median length6
Mean length7.416885522
Min length3

Characters and Unicode

Total characters440563
Distinct characters41
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowLudewa
2nd rowSerengeti
3rd rowSimanjiro
4th rowNanyumbu
5th rowKaragwe
ValueCountFrequency (%)
Njombe2503
 
4.2%
Arusha Rural1252
 
2.1%
Moshi Rural1251
 
2.1%
Bariadi1177
 
2.0%
Rungwe1106
 
1.9%
Kilosa1094
 
1.8%
Kasulu1047
 
1.8%
Mbozi1034
 
1.7%
Meru1009
 
1.7%
Bagamoyo997
 
1.7%
Other values (115)46930
79.0%
2021-04-14T09:52:16.350639image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
rural9552
 
13.5%
njombe2503
 
3.5%
urban1683
 
2.4%
moshi1330
 
1.9%
arusha1315
 
1.9%
bariadi1177
 
1.7%
singida1172
 
1.7%
rungwe1106
 
1.6%
kilosa1094
 
1.5%
kasulu1047
 
1.5%
Other values (106)48656
68.9%

Most occurring characters

ValueCountFrequency (%)
a69982
15.9%
o30079
 
6.8%
i29483
 
6.7%
u28324
 
6.4%
r26886
 
6.1%
e22579
 
5.1%
n22521
 
5.1%
l19238
 
4.4%
g18385
 
4.2%
M16017
 
3.6%
Other values (31)157069
35.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter358693
81.4%
Uppercase Letter70635
 
16.0%
Space Separator11235
 
2.6%

Most frequent character per category

ValueCountFrequency (%)
a69982
19.5%
o30079
 
8.4%
i29483
 
8.2%
u28324
 
7.9%
r26886
 
7.5%
e22579
 
6.3%
n22521
 
6.3%
l19238
 
5.4%
g18385
 
5.1%
m15622
 
4.4%
Other values (14)75594
21.1%
ValueCountFrequency (%)
M16017
22.7%
R12207
17.3%
K11663
16.5%
S6261
 
8.9%
N5760
 
8.2%
B4839
 
6.9%
U3410
 
4.8%
I2480
 
3.5%
L2131
 
3.0%
T1367
 
1.9%
Other values (6)4500
 
6.4%
ValueCountFrequency (%)
11235
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin429328
97.4%
Common11235
 
2.6%

Most frequent character per script

ValueCountFrequency (%)
a69982
16.3%
o30079
 
7.0%
i29483
 
6.9%
u28324
 
6.6%
r26886
 
6.3%
e22579
 
5.3%
n22521
 
5.2%
l19238
 
4.5%
g18385
 
4.3%
M16017
 
3.7%
Other values (30)145834
34.0%
ValueCountFrequency (%)
11235
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII440563
100.0%

Most frequent character per block

ValueCountFrequency (%)
a69982
15.9%
o30079
 
6.8%
i29483
 
6.7%
u28324
 
6.4%
r26886
 
6.1%
e22579
 
5.1%
n22521
 
5.1%
l19238
 
4.4%
g18385
 
4.2%
M16017
 
3.6%
Other values (31)157069
35.7%

ward
Categorical

HIGH CARDINALITY

Distinct2092
Distinct (%)3.5%
Missing0
Missing (%)0.0%
Memory size928.1 KiB
Igosi
 
307
Imalinyi
 
252
Siha Kati
 
232
Mdandu
 
231
Nduruma
 
217
Other values (2087)
58161 

Length

Max length23
Median length7
Mean length7.505841751
Min length3

Characters and Unicode

Total characters445847
Distinct characters54
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique30 ?
Unique (%)0.1%

Sample

1st rowMundindi
2nd rowNatta
3rd rowNgorika
4th rowNanyumbu
5th rowNyakasimbi
ValueCountFrequency (%)
Igosi307
 
0.5%
Imalinyi252
 
0.4%
Siha Kati232
 
0.4%
Mdandu231
 
0.4%
Nduruma217
 
0.4%
Kitunda203
 
0.3%
Mishamo203
 
0.3%
Msindo201
 
0.3%
Chalinze196
 
0.3%
Maji ya Chai190
 
0.3%
Other values (2082)57168
96.2%
2021-04-14T09:52:16.784176image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
mashariki580
 
0.9%
urban540
 
0.8%
siha434
 
0.7%
kusini393
 
0.6%
magharibi362
 
0.6%
igosi307
 
0.5%
masama303
 
0.5%
machame293
 
0.5%
kati270
 
0.4%
imalinyi252
 
0.4%
Other values (2106)61033
94.2%

Most occurring characters

ValueCountFrequency (%)
a69533
15.6%
i40243
 
9.0%
n29584
 
6.6%
u27015
 
6.1%
o26093
 
5.9%
e23589
 
5.3%
g21166
 
4.7%
M18916
 
4.2%
m16216
 
3.6%
l15799
 
3.5%
Other values (44)157693
35.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter374730
84.0%
Uppercase Letter64523
 
14.5%
Space Separator5408
 
1.2%
Other Punctuation1163
 
0.3%
Dash Punctuation23
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
M18916
29.3%
K11212
17.4%
I6094
 
9.4%
N5919
 
9.2%
S3354
 
5.2%
L3162
 
4.9%
B3098
 
4.8%
U2913
 
4.5%
C2123
 
3.3%
R1692
 
2.6%
Other values (15)6040
 
9.4%
ValueCountFrequency (%)
a69533
18.6%
i40243
10.7%
n29584
 
7.9%
u27015
 
7.2%
o26093
 
7.0%
e23589
 
6.3%
g21166
 
5.6%
m16216
 
4.3%
l15799
 
4.2%
r13057
 
3.5%
Other values (15)92435
24.7%
ValueCountFrequency (%)
'1013
87.1%
/150
 
12.9%
ValueCountFrequency (%)
5408
100.0%
ValueCountFrequency (%)
-23
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin439253
98.5%
Common6594
 
1.5%

Most frequent character per script

ValueCountFrequency (%)
a69533
15.8%
i40243
 
9.2%
n29584
 
6.7%
u27015
 
6.2%
o26093
 
5.9%
e23589
 
5.4%
g21166
 
4.8%
M18916
 
4.3%
m16216
 
3.7%
l15799
 
3.6%
Other values (40)151099
34.4%
ValueCountFrequency (%)
5408
82.0%
'1013
 
15.4%
/150
 
2.3%
-23
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII445847
100.0%

Most frequent character per block

ValueCountFrequency (%)
a69533
15.6%
i40243
 
9.0%
n29584
 
6.6%
u27015
 
6.1%
o26093
 
5.9%
e23589
 
5.3%
g21166
 
4.7%
M18916
 
4.2%
m16216
 
3.6%
l15799
 
3.5%
Other values (44)157693
35.4%

population
Real number (ℝ≥0)

ZEROS

Distinct1049
Distinct (%)1.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean179.9099832
Minimum0
Maximum30500
Zeros21381
Zeros (%)36.0%
Memory size928.1 KiB
2021-04-14T09:52:16.983947image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median25
Q3215
95-th percentile680
Maximum30500
Range30500
Interquartile range (IQR)215

Descriptive statistics

Standard deviation471.4821757
Coefficient of variation (CV)2.620655994
Kurtosis402.2801153
Mean179.9099832
Median Absolute Deviation (MAD)25
Skewness12.66071359
Sum10686653
Variance222295.442
MonotocityNot monotonic
2021-04-14T09:52:17.177444image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
021381
36.0%
17025
 
11.8%
2001940
 
3.3%
1501892
 
3.2%
2501681
 
2.8%
3001476
 
2.5%
1001146
 
1.9%
501139
 
1.9%
5001009
 
1.7%
350986
 
1.7%
Other values (1039)19725
33.2%
ValueCountFrequency (%)
021381
36.0%
17025
 
11.8%
24
 
< 0.1%
34
 
< 0.1%
413
 
< 0.1%
ValueCountFrequency (%)
305001
 
< 0.1%
153001
 
< 0.1%
114631
 
< 0.1%
100003
< 0.1%
98651
 
< 0.1%

public_meeting
Boolean

HIGH CORRELATION
MISSING

Distinct2
Distinct (%)< 0.1%
Missing3334
Missing (%)5.6%
Memory size928.1 KiB
True
51011 
False
 
5055
(Missing)
 
3334
ValueCountFrequency (%)
True51011
85.9%
False5055
 
8.5%
(Missing)3334
 
5.6%
2021-04-14T09:52:17.316576image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

recorded_by
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size928.1 KiB
GeoData Consultants Ltd
59400 

Length

Max length23
Median length23
Mean length23
Min length23

Characters and Unicode

Total characters1366200
Distinct characters14
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowGeoData Consultants Ltd
2nd rowGeoData Consultants Ltd
3rd rowGeoData Consultants Ltd
4th rowGeoData Consultants Ltd
5th rowGeoData Consultants Ltd
ValueCountFrequency (%)
GeoData Consultants Ltd59400
100.0%
2021-04-14T09:52:17.558788image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-04-14T09:52:17.659604image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
geodata59400
33.3%
consultants59400
33.3%
ltd59400
33.3%

Most occurring characters

ValueCountFrequency (%)
t237600
17.4%
a178200
13.0%
o118800
8.7%
118800
8.7%
n118800
8.7%
s118800
8.7%
G59400
 
4.3%
e59400
 
4.3%
D59400
 
4.3%
C59400
 
4.3%
Other values (4)237600
17.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1009800
73.9%
Uppercase Letter237600
 
17.4%
Space Separator118800
 
8.7%

Most frequent character per category

ValueCountFrequency (%)
t237600
23.5%
a178200
17.6%
o118800
11.8%
n118800
11.8%
s118800
11.8%
e59400
 
5.9%
u59400
 
5.9%
l59400
 
5.9%
d59400
 
5.9%
ValueCountFrequency (%)
G59400
25.0%
D59400
25.0%
C59400
25.0%
L59400
25.0%
ValueCountFrequency (%)
118800
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1247400
91.3%
Common118800
 
8.7%

Most frequent character per script

ValueCountFrequency (%)
t237600
19.0%
a178200
14.3%
o118800
9.5%
n118800
9.5%
s118800
9.5%
G59400
 
4.8%
e59400
 
4.8%
D59400
 
4.8%
C59400
 
4.8%
u59400
 
4.8%
Other values (3)178200
14.3%
ValueCountFrequency (%)
118800
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1366200
100.0%

Most frequent character per block

ValueCountFrequency (%)
t237600
17.4%
a178200
13.0%
o118800
8.7%
118800
8.7%
n118800
8.7%
s118800
8.7%
G59400
 
4.3%
e59400
 
4.3%
D59400
 
4.3%
C59400
 
4.3%
Other values (4)237600
17.4%

scheme_management
Categorical

HIGH CORRELATION
MISSING

Distinct12
Distinct (%)< 0.1%
Missing3877
Missing (%)6.5%
Memory size928.1 KiB
VWC
36793 
WUG
5206 
Water authority
 
3153
WUA
 
2883
Water Board
 
2748
Other values (7)
4740 

Length

Max length16
Median length3
Mean length4.644723808
Min length3

Characters and Unicode

Total characters257889
Distinct characters29
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowVWC
2nd rowOther
3rd rowVWC
4th rowVWC
5th rowVWC
ValueCountFrequency (%)
VWC36793
61.9%
WUG5206
 
8.8%
Water authority3153
 
5.3%
WUA2883
 
4.9%
Water Board2748
 
4.6%
Parastatal1680
 
2.8%
Private operator1063
 
1.8%
Company1061
 
1.8%
Other766
 
1.3%
SWC97
 
0.2%
Other values (2)73
 
0.1%
(Missing)3877
 
6.5%
2021-04-14T09:52:17.938279image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
vwc36793
58.9%
water5901
 
9.4%
wug5206
 
8.3%
authority3153
 
5.0%
wua2883
 
4.6%
board2748
 
4.4%
parastatal1680
 
2.7%
private1063
 
1.7%
operator1063
 
1.7%
company1061
 
1.7%
Other values (4)936
 
1.5%

Most occurring characters

ValueCountFrequency (%)
W50880
19.7%
C37951
14.7%
V36793
14.3%
a21709
8.4%
t18531
 
7.2%
r17509
 
6.8%
o9089
 
3.5%
e8794
 
3.4%
U8089
 
3.1%
6964
 
2.7%
Other values (19)41580
16.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter148229
57.5%
Lowercase Letter102696
39.8%
Space Separator6964
 
2.7%

Most frequent character per category

ValueCountFrequency (%)
a21709
21.1%
t18531
18.0%
r17509
17.0%
o9089
8.9%
e8794
8.6%
i4216
 
4.1%
y4214
 
4.1%
h3919
 
3.8%
u3225
 
3.1%
d2748
 
2.7%
Other values (6)8742
8.5%
ValueCountFrequency (%)
W50880
34.3%
C37951
25.6%
V36793
24.8%
U8089
 
5.5%
G5206
 
3.5%
A2883
 
1.9%
B2748
 
1.9%
P2743
 
1.9%
O766
 
0.5%
S97
 
0.1%
Other values (2)73
 
< 0.1%
ValueCountFrequency (%)
6964
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin250925
97.3%
Common6964
 
2.7%

Most frequent character per script

ValueCountFrequency (%)
W50880
20.3%
C37951
15.1%
V36793
14.7%
a21709
8.7%
t18531
 
7.4%
r17509
 
7.0%
o9089
 
3.6%
e8794
 
3.5%
U8089
 
3.2%
G5206
 
2.1%
Other values (18)36374
14.5%
ValueCountFrequency (%)
6964
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII257889
100.0%

Most frequent character per block

ValueCountFrequency (%)
W50880
19.7%
C37951
14.7%
V36793
14.3%
a21709
8.4%
t18531
 
7.2%
r17509
 
6.8%
o9089
 
3.5%
e8794
 
3.4%
U8089
 
3.1%
6964
 
2.7%
Other values (19)41580
16.1%

scheme_name
Categorical

HIGH CARDINALITY
MISSING

Distinct2696
Distinct (%)8.6%
Missing28166
Missing (%)47.4%
Memory size928.1 KiB
K
 
682
None
 
644
Borehole
 
546
Chalinze wate
 
405
M
 
400
Other values (2691)
28557 

Length

Max length46
Median length13
Mean length14.30521227
Min length1

Characters and Unicode

Total characters446809
Distinct characters68
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique712 ?
Unique (%)2.3%

Sample

1st rowRoman
2nd rowNyumba ya mungu pipe scheme
3rd rowZingibali
4th rowBL Bondeni
5th rowNone
ValueCountFrequency (%)
K682
 
1.1%
None644
 
1.1%
Borehole546
 
0.9%
Chalinze wate405
 
0.7%
M400
 
0.7%
DANIDA379
 
0.6%
Government320
 
0.5%
Ngana water supplied scheme270
 
0.5%
wanging'ombe water supply s261
 
0.4%
wanging'ombe supply scheme234
 
0.4%
Other values (2686)27093
45.6%
(Missing)28166
47.4%
2021-04-14T09:52:18.364569image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
water9770
 
13.6%
supply6745
 
9.4%
scheme2532
 
3.5%
wa2157
 
3.0%
gravity1914
 
2.7%
pipe1346
 
1.9%
maji1343
 
1.9%
mradi1097
 
1.5%
line1016
 
1.4%
supplied877
 
1.2%
Other values (2506)43219
60.0%

Most occurring characters

ValueCountFrequency (%)
a48584
 
10.9%
41252
 
9.2%
e35239
 
7.9%
i26411
 
5.9%
p22451
 
5.0%
r21816
 
4.9%
t19216
 
4.3%
u18441
 
4.1%
n17760
 
4.0%
o17418
 
3.9%
Other values (58)178221
39.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter353183
79.0%
Uppercase Letter50064
 
11.2%
Space Separator41252
 
9.2%
Other Punctuation1317
 
0.3%
Dash Punctuation554
 
0.1%
Open Punctuation191
 
< 0.1%
Decimal Number147
 
< 0.1%
Modifier Symbol70
 
< 0.1%
Close Punctuation31
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
a48584
13.8%
e35239
 
10.0%
i26411
 
7.5%
p22451
 
6.4%
r21816
 
6.2%
t19216
 
5.4%
u18441
 
5.2%
n17760
 
5.0%
o17418
 
4.9%
l17308
 
4.9%
Other values (16)108539
30.7%
ValueCountFrequency (%)
M9314
18.6%
K5600
11.2%
N4439
 
8.9%
S3770
 
7.5%
A2729
 
5.5%
I2691
 
5.4%
W2531
 
5.1%
B2387
 
4.8%
L2107
 
4.2%
U1790
 
3.6%
Other values (15)12706
25.4%
ValueCountFrequency (%)
261
41.5%
355
37.4%
77
 
4.8%
17
 
4.8%
47
 
4.8%
54
 
2.7%
03
 
2.0%
63
 
2.0%
ValueCountFrequency (%)
'938
71.2%
/370
 
28.1%
&8
 
0.6%
:1
 
0.1%
ValueCountFrequency (%)
41252
100.0%
ValueCountFrequency (%)
-554
100.0%
ValueCountFrequency (%)
(191
100.0%
ValueCountFrequency (%)
)31
100.0%
ValueCountFrequency (%)
`70
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin403247
90.3%
Common43562
 
9.7%

Most frequent character per script

ValueCountFrequency (%)
a48584
 
12.0%
e35239
 
8.7%
i26411
 
6.5%
p22451
 
5.6%
r21816
 
5.4%
t19216
 
4.8%
u18441
 
4.6%
n17760
 
4.4%
o17418
 
4.3%
l17308
 
4.3%
Other values (41)158603
39.3%
ValueCountFrequency (%)
41252
94.7%
'938
 
2.2%
-554
 
1.3%
/370
 
0.8%
(191
 
0.4%
`70
 
0.2%
261
 
0.1%
355
 
0.1%
)31
 
0.1%
&8
 
< 0.1%
Other values (7)32
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII446809
100.0%

Most frequent character per block

ValueCountFrequency (%)
a48584
 
10.9%
41252
 
9.2%
e35239
 
7.9%
i26411
 
5.9%
p22451
 
5.0%
r21816
 
4.9%
t19216
 
4.3%
u18441
 
4.1%
n17760
 
4.0%
o17418
 
3.9%
Other values (58)178221
39.9%

permit
Boolean

HIGH CORRELATION
MISSING

Distinct2
Distinct (%)< 0.1%
Missing3056
Missing (%)5.1%
Memory size928.1 KiB
True
38852 
False
17492 
(Missing)
 
3056
ValueCountFrequency (%)
True38852
65.4%
False17492
29.4%
(Missing)3056
 
5.1%
2021-04-14T09:52:18.495668image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

construction_year
Real number (ℝ≥0)

ZEROS

Distinct55
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1300.652475
Minimum0
Maximum2013
Zeros20709
Zeros (%)34.9%
Memory size928.1 KiB
2021-04-14T09:52:18.626927image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1986
Q32004
95-th percentile2010
Maximum2013
Range2013
Interquartile range (IQR)2004

Descriptive statistics

Standard deviation951.6205473
Coefficient of variation (CV)0.7316485885
Kurtosis-1.596432369
Mean1300.652475
Median Absolute Deviation (MAD)22
Skewness-0.6349277866
Sum77258757
Variance905581.6661
MonotocityNot monotonic
2021-04-14T09:52:18.834215image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
020709
34.9%
20102645
 
4.5%
20082613
 
4.4%
20092533
 
4.3%
20002091
 
3.5%
20071587
 
2.7%
20061471
 
2.5%
20031286
 
2.2%
20111256
 
2.1%
20041123
 
1.9%
Other values (45)22086
37.2%
ValueCountFrequency (%)
020709
34.9%
1960102
 
0.2%
196121
 
< 0.1%
196230
 
0.1%
196385
 
0.1%
ValueCountFrequency (%)
2013176
 
0.3%
20121084
1.8%
20111256
2.1%
20102645
4.5%
20092533
4.3%

extraction_type
Categorical

HIGH CORRELATION

Distinct18
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size928.1 KiB
gravity
26780 
nira/tanira
8154 
other
6430 
submersible
4764 
swn 80
3670 
Other values (13)
9602 

Length

Max length25
Median length7
Mean length7.719511785
Min length3

Characters and Unicode

Total characters458539
Distinct characters29
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowgravity
2nd rowgravity
3rd rowgravity
4th rowsubmersible
5th rowgravity
ValueCountFrequency (%)
gravity26780
45.1%
nira/tanira8154
 
13.7%
other6430
 
10.8%
submersible4764
 
8.0%
swn 803670
 
6.2%
mono2865
 
4.8%
india mark ii2400
 
4.0%
afridev1770
 
3.0%
ksb1415
 
2.4%
other - rope pump451
 
0.8%
Other values (8)701
 
1.2%
2021-04-14T09:52:19.209714image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
gravity26780
38.1%
nira/tanira8154
 
11.6%
other7197
 
10.2%
submersible4764
 
6.8%
swn3899
 
5.5%
803670
 
5.2%
mono2865
 
4.1%
india2498
 
3.6%
mark2498
 
3.6%
ii2400
 
3.4%
Other values (13)5640
 
8.0%

Most occurring characters

ValueCountFrequency (%)
i60078
13.1%
r59768
13.0%
a58179
12.7%
t42131
9.2%
v28550
 
6.2%
y26867
 
5.9%
g26782
 
5.8%
n25691
 
5.6%
e19036
 
4.2%
s14844
 
3.2%
Other values (19)96613
21.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter430853
94.0%
Space Separator10965
 
2.4%
Other Punctuation8156
 
1.8%
Decimal Number7798
 
1.7%
Dash Punctuation767
 
0.2%

Most frequent character per category

ValueCountFrequency (%)
i60078
13.9%
r59768
13.9%
a58179
13.5%
t42131
9.8%
v28550
6.6%
y26867
 
6.2%
g26782
 
6.2%
n25691
 
6.0%
e19036
 
4.4%
s14844
 
3.4%
Other values (13)68927
16.0%
ValueCountFrequency (%)
83899
50.0%
03670
47.1%
1229
 
2.9%
ValueCountFrequency (%)
10965
100.0%
ValueCountFrequency (%)
/8156
100.0%
ValueCountFrequency (%)
-767
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin430853
94.0%
Common27686
 
6.0%

Most frequent character per script

ValueCountFrequency (%)
i60078
13.9%
r59768
13.9%
a58179
13.5%
t42131
9.8%
v28550
6.6%
y26867
 
6.2%
g26782
 
6.2%
n25691
 
6.0%
e19036
 
4.4%
s14844
 
3.4%
Other values (13)68927
16.0%
ValueCountFrequency (%)
10965
39.6%
/8156
29.5%
83899
 
14.1%
03670
 
13.3%
-767
 
2.8%
1229
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII458539
100.0%

Most frequent character per block

ValueCountFrequency (%)
i60078
13.1%
r59768
13.0%
a58179
12.7%
t42131
9.2%
v28550
 
6.2%
y26867
 
5.9%
g26782
 
5.8%
n25691
 
5.6%
e19036
 
4.2%
s14844
 
3.2%
Other values (19)96613
21.1%

extraction_type_group
Categorical

HIGH CORRELATION

Distinct13
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size928.1 KiB
gravity
26780 
nira/tanira
8154 
other
6430 
submersible
6179 
swn 80
3670 
Other values (8)
8187 

Length

Max length15
Median length7
Mean length7.880538721
Min length4

Characters and Unicode

Total characters468104
Distinct characters26
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowgravity
2nd rowgravity
3rd rowgravity
4th rowsubmersible
5th rowgravity
ValueCountFrequency (%)
gravity26780
45.1%
nira/tanira8154
 
13.7%
other6430
 
10.8%
submersible6179
 
10.4%
swn 803670
 
6.2%
mono2865
 
4.8%
india mark ii2400
 
4.0%
afridev1770
 
3.0%
rope pump451
 
0.8%
other handpump364
 
0.6%
Other values (3)337
 
0.6%
2021-04-14T09:52:19.522287image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
gravity26780
38.8%
nira/tanira8154
 
11.8%
other6916
 
10.0%
submersible6179
 
9.0%
swn3670
 
5.3%
803670
 
5.3%
mono2865
 
4.2%
india2498
 
3.6%
mark2498
 
3.6%
ii2400
 
3.5%
Other values (7)3373
 
4.9%

Most occurring characters

ValueCountFrequency (%)
i61244
13.1%
r61141
13.1%
a58372
12.5%
t41972
9.0%
v28550
 
6.1%
g26780
 
5.7%
y26780
 
5.7%
n25822
 
5.5%
e21729
 
4.6%
s16028
 
3.4%
Other values (16)99686
21.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter442890
94.6%
Space Separator9603
 
2.1%
Other Punctuation8154
 
1.7%
Decimal Number7340
 
1.6%
Dash Punctuation117
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
i61244
13.8%
r61141
13.8%
a58372
13.2%
t41972
9.5%
v28550
 
6.4%
g26780
 
6.0%
y26780
 
6.0%
n25822
 
5.8%
e21729
 
4.9%
s16028
 
3.6%
Other values (11)74472
16.8%
ValueCountFrequency (%)
83670
50.0%
03670
50.0%
ValueCountFrequency (%)
9603
100.0%
ValueCountFrequency (%)
/8154
100.0%
ValueCountFrequency (%)
-117
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin442890
94.6%
Common25214
 
5.4%

Most frequent character per script

ValueCountFrequency (%)
i61244
13.8%
r61141
13.8%
a58372
13.2%
t41972
9.5%
v28550
 
6.4%
g26780
 
6.0%
y26780
 
6.0%
n25822
 
5.8%
e21729
 
4.9%
s16028
 
3.6%
Other values (11)74472
16.8%
ValueCountFrequency (%)
9603
38.1%
/8154
32.3%
83670
 
14.6%
03670
 
14.6%
-117
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII468104
100.0%

Most frequent character per block

ValueCountFrequency (%)
i61244
13.1%
r61141
13.1%
a58372
12.5%
t41972
9.0%
v28550
 
6.1%
g26780
 
5.7%
y26780
 
5.7%
n25822
 
5.5%
e21729
 
4.6%
s16028
 
3.4%
Other values (16)99686
21.3%

extraction_type_class
Categorical

HIGH CORRELATION

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size928.1 KiB
gravity
26780 
handpump
16456 
other
6430 
submersible
6179 
motorpump
2987 
Other values (2)
 
568

Length

Max length12
Median length7
Mean length7.602239057
Min length5

Characters and Unicode

Total characters451573
Distinct characters21
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowgravity
2nd rowgravity
3rd rowgravity
4th rowsubmersible
5th rowgravity
ValueCountFrequency (%)
gravity26780
45.1%
handpump16456
27.7%
other6430
 
10.8%
submersible6179
 
10.4%
motorpump2987
 
5.0%
rope pump451
 
0.8%
wind-powered117
 
0.2%
2021-04-14T09:52:19.855082image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-04-14T09:52:19.965998image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
gravity26780
44.7%
handpump16456
27.5%
other6430
 
10.7%
submersible6179
 
10.3%
motorpump2987
 
5.0%
pump451
 
0.8%
rope451
 
0.8%
wind-powered117
 
0.2%

Most occurring characters

ValueCountFrequency (%)
a43236
 
9.6%
r42944
 
9.5%
p40356
 
8.9%
t36197
 
8.0%
i33076
 
7.3%
m29060
 
6.4%
g26780
 
5.9%
v26780
 
5.9%
y26780
 
5.9%
u26073
 
5.8%
Other values (11)120291
26.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter451005
99.9%
Space Separator451
 
0.1%
Dash Punctuation117
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
a43236
 
9.6%
r42944
 
9.5%
p40356
 
8.9%
t36197
 
8.0%
i33076
 
7.3%
m29060
 
6.4%
g26780
 
5.9%
v26780
 
5.9%
y26780
 
5.9%
u26073
 
5.8%
Other values (9)119723
26.5%
ValueCountFrequency (%)
-117
100.0%
ValueCountFrequency (%)
451
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin451005
99.9%
Common568
 
0.1%

Most frequent character per script

ValueCountFrequency (%)
a43236
 
9.6%
r42944
 
9.5%
p40356
 
8.9%
t36197
 
8.0%
i33076
 
7.3%
m29060
 
6.4%
g26780
 
5.9%
v26780
 
5.9%
y26780
 
5.9%
u26073
 
5.8%
Other values (9)119723
26.5%
ValueCountFrequency (%)
451
79.4%
-117
 
20.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII451573
100.0%

Most frequent character per block

ValueCountFrequency (%)
a43236
 
9.6%
r42944
 
9.5%
p40356
 
8.9%
t36197
 
8.0%
i33076
 
7.3%
m29060
 
6.4%
g26780
 
5.9%
v26780
 
5.9%
y26780
 
5.9%
u26073
 
5.8%
Other values (11)120291
26.6%

management
Categorical

HIGH CORRELATION

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size928.1 KiB
vwc
40507 
wug
6515 
water board
 
2933
wua
 
2535
private operator
 
1971
Other values (7)
4939 

Length

Max length16
Median length3
Mean length4.350639731
Min length3

Characters and Unicode

Total characters258428
Distinct characters23
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowvwc
2nd rowwug
3rd rowvwc
4th rowvwc
5th rowother
ValueCountFrequency (%)
vwc40507
68.2%
wug6515
 
11.0%
water board2933
 
4.9%
wua2535
 
4.3%
private operator1971
 
3.3%
parastatal1768
 
3.0%
water authority904
 
1.5%
other844
 
1.4%
company685
 
1.2%
unknown561
 
0.9%
Other values (2)177
 
0.3%
2021-04-14T09:52:20.318990image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
vwc40507
61.9%
wug6515
 
10.0%
water3837
 
5.9%
board2933
 
4.5%
wua2535
 
3.9%
private1971
 
3.0%
operator1971
 
3.0%
parastatal1768
 
2.7%
other943
 
1.4%
authority904
 
1.4%
Other values (5)1522
 
2.3%

Most occurring characters

ValueCountFrequency (%)
w53955
20.9%
v42478
16.4%
c41291
16.0%
a21908
8.5%
r16376
 
6.3%
t14222
 
5.5%
u10593
 
4.1%
o10166
 
3.9%
e8722
 
3.4%
g6515
 
2.5%
Other values (13)32202
12.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter252323
97.6%
Space Separator6006
 
2.3%
Dash Punctuation99
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
w53955
21.4%
v42478
16.8%
c41291
16.4%
a21908
8.7%
r16376
 
6.5%
t14222
 
5.6%
u10593
 
4.2%
o10166
 
4.0%
e8722
 
3.5%
g6515
 
2.6%
Other values (11)26097
10.3%
ValueCountFrequency (%)
6006
100.0%
ValueCountFrequency (%)
-99
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin252323
97.6%
Common6105
 
2.4%

Most frequent character per script

ValueCountFrequency (%)
w53955
21.4%
v42478
16.8%
c41291
16.4%
a21908
8.7%
r16376
 
6.5%
t14222
 
5.6%
u10593
 
4.2%
o10166
 
4.0%
e8722
 
3.5%
g6515
 
2.6%
Other values (11)26097
10.3%
ValueCountFrequency (%)
6006
98.4%
-99
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII258428
100.0%

Most frequent character per block

ValueCountFrequency (%)
w53955
20.9%
v42478
16.4%
c41291
16.0%
a21908
8.5%
r16376
 
6.3%
t14222
 
5.5%
u10593
 
4.1%
o10166
 
3.9%
e8722
 
3.4%
g6515
 
2.5%
Other values (13)32202
12.5%

management_group
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size928.1 KiB
user-group
52490 
commercial
 
3638
parastatal
 
1768
other
 
943
unknown
 
561

Length

Max length10
Median length10
Mean length9.892289562
Min length5

Characters and Unicode

Total characters587602
Distinct characters18
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowuser-group
2nd rowuser-group
3rd rowuser-group
4th rowuser-group
5th rowother
ValueCountFrequency (%)
user-group52490
88.4%
commercial3638
 
6.1%
parastatal1768
 
3.0%
other943
 
1.6%
unknown561
 
0.9%
2021-04-14T09:52:20.635923image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-04-14T09:52:20.758354image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
user-group52490
88.4%
commercial3638
 
6.1%
parastatal1768
 
3.0%
other943
 
1.6%
unknown561
 
0.9%

Most occurring characters

ValueCountFrequency (%)
r111329
18.9%
u105541
18.0%
o57632
9.8%
e57071
9.7%
s54258
9.2%
p54258
9.2%
-52490
8.9%
g52490
8.9%
a10710
 
1.8%
c7276
 
1.2%
Other values (8)24547
 
4.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter535112
91.1%
Dash Punctuation52490
 
8.9%

Most frequent character per category

ValueCountFrequency (%)
r111329
20.8%
u105541
19.7%
o57632
10.8%
e57071
10.7%
s54258
10.1%
p54258
10.1%
g52490
9.8%
a10710
 
2.0%
c7276
 
1.4%
m7276
 
1.4%
Other values (7)17271
 
3.2%
ValueCountFrequency (%)
-52490
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin535112
91.1%
Common52490
 
8.9%

Most frequent character per script

ValueCountFrequency (%)
r111329
20.8%
u105541
19.7%
o57632
10.8%
e57071
10.7%
s54258
10.1%
p54258
10.1%
g52490
9.8%
a10710
 
2.0%
c7276
 
1.4%
m7276
 
1.4%
Other values (7)17271
 
3.2%
ValueCountFrequency (%)
-52490
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII587602
100.0%

Most frequent character per block

ValueCountFrequency (%)
r111329
18.9%
u105541
18.0%
o57632
9.8%
e57071
9.7%
s54258
9.2%
p54258
9.2%
-52490
8.9%
g52490
8.9%
a10710
 
1.8%
c7276
 
1.2%
Other values (8)24547
 
4.2%

payment
Categorical

HIGH CORRELATION

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size928.1 KiB
never pay
25348 
pay per bucket
8985 
pay monthly
8300 
unknown
8157 
pay when scheme fails
3914 
Other values (2)
4696 

Length

Max length21
Median length9
Mean length10.66479798
Min length5

Characters and Unicode

Total characters633489
Distinct characters21
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowpay annually
2nd rownever pay
3rd rowpay per bucket
4th rownever pay
5th rownever pay
ValueCountFrequency (%)
never pay25348
42.7%
pay per bucket8985
 
15.1%
pay monthly8300
 
14.0%
unknown8157
 
13.7%
pay when scheme fails3914
 
6.6%
pay annually3642
 
6.1%
other1054
 
1.8%
2021-04-14T09:52:21.059213image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-04-14T09:52:21.175140image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
pay50189
39.7%
never25348
20.1%
bucket8985
 
7.1%
per8985
 
7.1%
monthly8300
 
6.6%
unknown8157
 
6.5%
when3914
 
3.1%
scheme3914
 
3.1%
fails3914
 
3.1%
annually3642
 
2.9%

Most occurring characters

ValueCountFrequency (%)
e81462
12.9%
n69317
10.9%
67002
10.6%
y62131
9.8%
a61387
9.7%
p59174
9.3%
r35387
 
5.6%
v25348
 
4.0%
u20784
 
3.3%
l19498
 
3.1%
Other values (11)131999
20.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter566487
89.4%
Space Separator67002
 
10.6%

Most frequent character per category

ValueCountFrequency (%)
e81462
14.4%
n69317
12.2%
y62131
11.0%
a61387
10.8%
p59174
10.4%
r35387
 
6.2%
v25348
 
4.5%
u20784
 
3.7%
l19498
 
3.4%
t18339
 
3.2%
Other values (10)113660
20.1%
ValueCountFrequency (%)
67002
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin566487
89.4%
Common67002
 
10.6%

Most frequent character per script

ValueCountFrequency (%)
e81462
14.4%
n69317
12.2%
y62131
11.0%
a61387
10.8%
p59174
10.4%
r35387
 
6.2%
v25348
 
4.5%
u20784
 
3.7%
l19498
 
3.4%
t18339
 
3.2%
Other values (10)113660
20.1%
ValueCountFrequency (%)
67002
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII633489
100.0%

Most frequent character per block

ValueCountFrequency (%)
e81462
12.9%
n69317
10.9%
67002
10.6%
y62131
9.8%
a61387
9.7%
p59174
9.3%
r35387
 
5.6%
v25348
 
4.0%
u20784
 
3.3%
l19498
 
3.1%
Other values (11)131999
20.8%

payment_type
Categorical

HIGH CORRELATION

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size928.1 KiB
never pay
25348 
per bucket
8985 
monthly
8300 
unknown
8157 
on failure
3914 
Other values (2)
4696 

Length

Max length10
Median length9
Mean length8.530757576
Min length5

Characters and Unicode

Total characters506727
Distinct characters20
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowannually
2nd rownever pay
3rd rowper bucket
4th rownever pay
5th rownever pay
ValueCountFrequency (%)
never pay25348
42.7%
per bucket8985
 
15.1%
monthly8300
 
14.0%
unknown8157
 
13.7%
on failure3914
 
6.6%
annually3642
 
6.1%
other1054
 
1.8%
2021-04-14T09:52:21.507222image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-04-14T09:52:21.623109image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
pay25348
26.0%
never25348
26.0%
bucket8985
 
9.2%
per8985
 
9.2%
monthly8300
 
8.5%
unknown8157
 
8.4%
on3914
 
4.0%
failure3914
 
4.0%
annually3642
 
3.7%
other1054
 
1.1%

Most occurring characters

ValueCountFrequency (%)
e73634
14.5%
n69317
13.7%
r39301
 
7.8%
38247
 
7.5%
y37290
 
7.4%
a36546
 
7.2%
p34333
 
6.8%
v25348
 
5.0%
u24698
 
4.9%
o21425
 
4.2%
Other values (10)106588
21.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter468480
92.5%
Space Separator38247
 
7.5%

Most frequent character per category

ValueCountFrequency (%)
e73634
15.7%
n69317
14.8%
r39301
8.4%
y37290
8.0%
a36546
7.8%
p34333
 
7.3%
v25348
 
5.4%
u24698
 
5.3%
o21425
 
4.6%
l19498
 
4.2%
Other values (9)87090
18.6%
ValueCountFrequency (%)
38247
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin468480
92.5%
Common38247
 
7.5%

Most frequent character per script

ValueCountFrequency (%)
e73634
15.7%
n69317
14.8%
r39301
8.4%
y37290
8.0%
a36546
7.8%
p34333
 
7.3%
v25348
 
5.4%
u24698
 
5.3%
o21425
 
4.6%
l19498
 
4.2%
Other values (9)87090
18.6%
ValueCountFrequency (%)
38247
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII506727
100.0%

Most frequent character per block

ValueCountFrequency (%)
e73634
14.5%
n69317
13.7%
r39301
 
7.8%
38247
 
7.5%
y37290
 
7.4%
a36546
 
7.2%
p34333
 
6.8%
v25348
 
5.0%
u24698
 
4.9%
o21425
 
4.2%
Other values (10)106588
21.0%

water_quality
Categorical

HIGH CORRELATION

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size928.1 KiB
soft
50818 
salty
 
4856
unknown
 
1876
milky
 
804
coloured
 
490
Other values (3)
 
556

Length

Max length18
Median length4
Mean length4.303282828
Min length4

Characters and Unicode

Total characters255615
Distinct characters19
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowsoft
2nd rowsoft
3rd rowsoft
4th rowsoft
5th rowsoft
ValueCountFrequency (%)
soft50818
85.6%
salty4856
 
8.2%
unknown1876
 
3.2%
milky804
 
1.4%
coloured490
 
0.8%
salty abandoned339
 
0.6%
fluoride200
 
0.3%
fluoride abandoned17
 
< 0.1%
2021-04-14T09:52:21.952185image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-04-14T09:52:22.065116image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
soft50818
85.0%
salty5195
 
8.7%
unknown1876
 
3.1%
milky804
 
1.3%
coloured490
 
0.8%
abandoned356
 
0.6%
fluoride217
 
0.4%

Most occurring characters

ValueCountFrequency (%)
s56013
21.9%
t56013
21.9%
o54247
21.2%
f51035
20.0%
l6706
 
2.6%
n6340
 
2.5%
y5999
 
2.3%
a5907
 
2.3%
k2680
 
1.0%
u2583
 
1.0%
Other values (9)8092
 
3.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter255259
99.9%
Space Separator356
 
0.1%

Most frequent character per category

ValueCountFrequency (%)
s56013
21.9%
t56013
21.9%
o54247
21.3%
f51035
20.0%
l6706
 
2.6%
n6340
 
2.5%
y5999
 
2.4%
a5907
 
2.3%
k2680
 
1.0%
u2583
 
1.0%
Other values (8)7736
 
3.0%
ValueCountFrequency (%)
356
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin255259
99.9%
Common356
 
0.1%

Most frequent character per script

ValueCountFrequency (%)
s56013
21.9%
t56013
21.9%
o54247
21.3%
f51035
20.0%
l6706
 
2.6%
n6340
 
2.5%
y5999
 
2.4%
a5907
 
2.3%
k2680
 
1.0%
u2583
 
1.0%
Other values (8)7736
 
3.0%
ValueCountFrequency (%)
356
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII255615
100.0%

Most frequent character per block

ValueCountFrequency (%)
s56013
21.9%
t56013
21.9%
o54247
21.2%
f51035
20.0%
l6706
 
2.6%
n6340
 
2.5%
y5999
 
2.3%
a5907
 
2.3%
k2680
 
1.0%
u2583
 
1.0%
Other values (9)8092
 
3.2%

quality_group
Categorical

HIGH CORRELATION

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size928.1 KiB
good
50818 
salty
5195 
unknown
 
1876
milky
 
804
colored
 
490

Length

Max length8
Median length4
Mean length4.23510101
Min length4

Characters and Unicode

Total characters251565
Distinct characters18
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowgood
2nd rowgood
3rd rowgood
4th rowgood
5th rowgood
ValueCountFrequency (%)
good50818
85.6%
salty5195
 
8.7%
unknown1876
 
3.2%
milky804
 
1.4%
colored490
 
0.8%
fluoride217
 
0.4%
2021-04-14T09:52:22.395941image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-04-14T09:52:22.507062image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
good50818
85.6%
salty5195
 
8.7%
unknown1876
 
3.2%
milky804
 
1.4%
colored490
 
0.8%
fluoride217
 
0.4%

Most occurring characters

ValueCountFrequency (%)
o104709
41.6%
d51525
20.5%
g50818
20.2%
l6706
 
2.7%
y5999
 
2.4%
n5628
 
2.2%
s5195
 
2.1%
a5195
 
2.1%
t5195
 
2.1%
k2680
 
1.1%
Other values (8)7915
 
3.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter251565
100.0%

Most frequent character per category

ValueCountFrequency (%)
o104709
41.6%
d51525
20.5%
g50818
20.2%
l6706
 
2.7%
y5999
 
2.4%
n5628
 
2.2%
s5195
 
2.1%
a5195
 
2.1%
t5195
 
2.1%
k2680
 
1.1%
Other values (8)7915
 
3.1%

Most occurring scripts

ValueCountFrequency (%)
Latin251565
100.0%

Most frequent character per script

ValueCountFrequency (%)
o104709
41.6%
d51525
20.5%
g50818
20.2%
l6706
 
2.7%
y5999
 
2.4%
n5628
 
2.2%
s5195
 
2.1%
a5195
 
2.1%
t5195
 
2.1%
k2680
 
1.1%
Other values (8)7915
 
3.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII251565
100.0%

Most frequent character per block

ValueCountFrequency (%)
o104709
41.6%
d51525
20.5%
g50818
20.2%
l6706
 
2.7%
y5999
 
2.4%
n5628
 
2.2%
s5195
 
2.1%
a5195
 
2.1%
t5195
 
2.1%
k2680
 
1.1%
Other values (8)7915
 
3.1%

quantity
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size928.1 KiB
enough
33186 
insufficient
15129 
dry
6246 
seasonal
4050 
unknown
 
789

Length

Max length12
Median length6
Mean length7.362373737
Min length3

Characters and Unicode

Total characters437325
Distinct characters18
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowenough
2nd rowinsufficient
3rd rowenough
4th rowdry
5th rowseasonal
ValueCountFrequency (%)
enough33186
55.9%
insufficient15129
25.5%
dry6246
 
10.5%
seasonal4050
 
6.8%
unknown789
 
1.3%
2021-04-14T09:52:22.817919image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-04-14T09:52:22.930367image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
enough33186
55.9%
insufficient15129
25.5%
dry6246
 
10.5%
seasonal4050
 
6.8%
unknown789
 
1.3%

Most occurring characters

ValueCountFrequency (%)
n69861
16.0%
e52365
12.0%
u49104
11.2%
i45387
10.4%
o38025
8.7%
g33186
7.6%
h33186
7.6%
f30258
6.9%
s23229
 
5.3%
c15129
 
3.5%
Other values (8)47595
10.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter437325
100.0%

Most frequent character per category

ValueCountFrequency (%)
n69861
16.0%
e52365
12.0%
u49104
11.2%
i45387
10.4%
o38025
8.7%
g33186
7.6%
h33186
7.6%
f30258
6.9%
s23229
 
5.3%
c15129
 
3.5%
Other values (8)47595
10.9%

Most occurring scripts

ValueCountFrequency (%)
Latin437325
100.0%

Most frequent character per script

ValueCountFrequency (%)
n69861
16.0%
e52365
12.0%
u49104
11.2%
i45387
10.4%
o38025
8.7%
g33186
7.6%
h33186
7.6%
f30258
6.9%
s23229
 
5.3%
c15129
 
3.5%
Other values (8)47595
10.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII437325
100.0%

Most frequent character per block

ValueCountFrequency (%)
n69861
16.0%
e52365
12.0%
u49104
11.2%
i45387
10.4%
o38025
8.7%
g33186
7.6%
h33186
7.6%
f30258
6.9%
s23229
 
5.3%
c15129
 
3.5%
Other values (8)47595
10.9%

quantity_group
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size928.1 KiB
enough
33186 
insufficient
15129 
dry
6246 
seasonal
4050 
unknown
 
789

Length

Max length12
Median length6
Mean length7.362373737
Min length3

Characters and Unicode

Total characters437325
Distinct characters18
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowenough
2nd rowinsufficient
3rd rowenough
4th rowdry
5th rowseasonal
ValueCountFrequency (%)
enough33186
55.9%
insufficient15129
25.5%
dry6246
 
10.5%
seasonal4050
 
6.8%
unknown789
 
1.3%
2021-04-14T09:52:23.583217image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-04-14T09:52:23.683495image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
enough33186
55.9%
insufficient15129
25.5%
dry6246
 
10.5%
seasonal4050
 
6.8%
unknown789
 
1.3%

Most occurring characters

ValueCountFrequency (%)
n69861
16.0%
e52365
12.0%
u49104
11.2%
i45387
10.4%
o38025
8.7%
g33186
7.6%
h33186
7.6%
f30258
6.9%
s23229
 
5.3%
c15129
 
3.5%
Other values (8)47595
10.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter437325
100.0%

Most frequent character per category

ValueCountFrequency (%)
n69861
16.0%
e52365
12.0%
u49104
11.2%
i45387
10.4%
o38025
8.7%
g33186
7.6%
h33186
7.6%
f30258
6.9%
s23229
 
5.3%
c15129
 
3.5%
Other values (8)47595
10.9%

Most occurring scripts

ValueCountFrequency (%)
Latin437325
100.0%

Most frequent character per script

ValueCountFrequency (%)
n69861
16.0%
e52365
12.0%
u49104
11.2%
i45387
10.4%
o38025
8.7%
g33186
7.6%
h33186
7.6%
f30258
6.9%
s23229
 
5.3%
c15129
 
3.5%
Other values (8)47595
10.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII437325
100.0%

Most frequent character per block

ValueCountFrequency (%)
n69861
16.0%
e52365
12.0%
u49104
11.2%
i45387
10.4%
o38025
8.7%
g33186
7.6%
h33186
7.6%
f30258
6.9%
s23229
 
5.3%
c15129
 
3.5%
Other values (8)47595
10.9%

source
Categorical

HIGH CORRELATION

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size928.1 KiB
spring
17021 
shallow well
16824 
machine dbh
11075 
river
9612 
rainwater harvesting
2295 
Other values (5)
2573 

Length

Max length20
Median length11
Mean length8.978804714
Min length3

Characters and Unicode

Total characters533341
Distinct characters21
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowspring
2nd rowrainwater harvesting
3rd rowdam
4th rowmachine dbh
5th rowrainwater harvesting
ValueCountFrequency (%)
spring17021
28.7%
shallow well16824
28.3%
machine dbh11075
18.6%
river9612
16.2%
rainwater harvesting2295
 
3.9%
hand dtw874
 
1.5%
lake765
 
1.3%
dam656
 
1.1%
other212
 
0.4%
unknown66
 
0.1%
2021-04-14T09:52:24.014609image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-04-14T09:52:24.145595image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
spring17021
18.8%
shallow16824
18.6%
well16824
18.6%
dbh11075
12.2%
machine11075
12.2%
river9612
10.6%
rainwater2295
 
2.5%
harvesting2295
 
2.5%
hand874
 
1.0%
dtw874
 
1.0%
Other values (4)1699
 
1.9%

Most occurring characters

ValueCountFrequency (%)
l68061
12.8%
r43342
 
8.1%
e43078
 
8.1%
h42355
 
7.9%
i42298
 
7.9%
a37079
 
7.0%
w36883
 
6.9%
s36140
 
6.8%
n33758
 
6.3%
31068
 
5.8%
Other values (11)119279
22.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter502273
94.2%
Space Separator31068
 
5.8%

Most frequent character per category

ValueCountFrequency (%)
l68061
13.6%
r43342
8.6%
e43078
8.6%
h42355
8.4%
i42298
8.4%
a37079
 
7.4%
w36883
 
7.3%
s36140
 
7.2%
n33758
 
6.7%
g19316
 
3.8%
Other values (10)99963
19.9%
ValueCountFrequency (%)
31068
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin502273
94.2%
Common31068
 
5.8%

Most frequent character per script

ValueCountFrequency (%)
l68061
13.6%
r43342
8.6%
e43078
8.6%
h42355
8.4%
i42298
8.4%
a37079
 
7.4%
w36883
 
7.3%
s36140
 
7.2%
n33758
 
6.7%
g19316
 
3.8%
Other values (10)99963
19.9%
ValueCountFrequency (%)
31068
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII533341
100.0%

Most frequent character per block

ValueCountFrequency (%)
l68061
12.8%
r43342
 
8.1%
e43078
 
8.1%
h42355
 
7.9%
i42298
 
7.9%
a37079
 
7.0%
w36883
 
6.9%
s36140
 
6.8%
n33758
 
6.3%
31068
 
5.8%
Other values (11)119279
22.4%

source_type
Categorical

HIGH CORRELATION

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size928.1 KiB
spring
17021 
shallow well
16824 
borehole
11949 
river/lake
10377 
rainwater harvesting
2295 
Other values (2)
 
934

Length

Max length20
Median length8
Mean length9.303602694
Min length3

Characters and Unicode

Total characters552634
Distinct characters20
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowspring
2nd rowrainwater harvesting
3rd rowdam
4th rowborehole
5th rowrainwater harvesting
ValueCountFrequency (%)
spring17021
28.7%
shallow well16824
28.3%
borehole11949
20.1%
river/lake10377
17.5%
rainwater harvesting2295
 
3.9%
dam656
 
1.1%
other278
 
0.5%
2021-04-14T09:52:24.500664image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-04-14T09:52:24.621624image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
spring17021
21.7%
shallow16824
21.4%
well16824
21.4%
borehole11949
15.2%
river/lake10377
13.2%
rainwater2295
 
2.9%
harvesting2295
 
2.9%
dam656
 
0.8%
other278
 
0.4%

Most occurring characters

ValueCountFrequency (%)
l89622
16.2%
e66344
12.0%
r56887
10.3%
o41000
 
7.4%
s36140
 
6.5%
w35943
 
6.5%
a34742
 
6.3%
i31988
 
5.8%
h31346
 
5.7%
n21611
 
3.9%
Other values (10)107011
19.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter523138
94.7%
Space Separator19119
 
3.5%
Other Punctuation10377
 
1.9%

Most frequent character per category

ValueCountFrequency (%)
l89622
17.1%
e66344
12.7%
r56887
10.9%
o41000
7.8%
s36140
6.9%
w35943
6.9%
a34742
 
6.6%
i31988
 
6.1%
h31346
 
6.0%
n21611
 
4.1%
Other values (8)77515
14.8%
ValueCountFrequency (%)
19119
100.0%
ValueCountFrequency (%)
/10377
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin523138
94.7%
Common29496
 
5.3%

Most frequent character per script

ValueCountFrequency (%)
l89622
17.1%
e66344
12.7%
r56887
10.9%
o41000
7.8%
s36140
6.9%
w35943
6.9%
a34742
 
6.6%
i31988
 
6.1%
h31346
 
6.0%
n21611
 
4.1%
Other values (8)77515
14.8%
ValueCountFrequency (%)
19119
64.8%
/10377
35.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII552634
100.0%

Most frequent character per block

ValueCountFrequency (%)
l89622
16.2%
e66344
12.0%
r56887
10.3%
o41000
 
7.4%
s36140
 
6.5%
w35943
 
6.5%
a34742
 
6.3%
i31988
 
5.8%
h31346
 
5.7%
n21611
 
3.9%
Other values (10)107011
19.4%

source_class
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size928.1 KiB
groundwater
45794 
surface
13328 
unknown
 
278

Length

Max length11
Median length11
Mean length10.08377104
Min length7

Characters and Unicode

Total characters598976
Distinct characters14
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowgroundwater
2nd rowsurface
3rd rowsurface
4th rowgroundwater
5th rowsurface
ValueCountFrequency (%)
groundwater45794
77.1%
surface13328
 
22.4%
unknown278
 
0.5%
2021-04-14T09:52:24.943159image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-04-14T09:52:25.059061image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
groundwater45794
77.1%
surface13328
 
22.4%
unknown278
 
0.5%

Most occurring characters

ValueCountFrequency (%)
r104916
17.5%
u59400
9.9%
a59122
9.9%
e59122
9.9%
n46628
7.8%
o46072
7.7%
w46072
7.7%
g45794
7.6%
d45794
7.6%
t45794
7.6%
Other values (4)40262
 
6.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter598976
100.0%

Most frequent character per category

ValueCountFrequency (%)
r104916
17.5%
u59400
9.9%
a59122
9.9%
e59122
9.9%
n46628
7.8%
o46072
7.7%
w46072
7.7%
g45794
7.6%
d45794
7.6%
t45794
7.6%
Other values (4)40262
 
6.7%

Most occurring scripts

ValueCountFrequency (%)
Latin598976
100.0%

Most frequent character per script

ValueCountFrequency (%)
r104916
17.5%
u59400
9.9%
a59122
9.9%
e59122
9.9%
n46628
7.8%
o46072
7.7%
w46072
7.7%
g45794
7.6%
d45794
7.6%
t45794
7.6%
Other values (4)40262
 
6.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII598976
100.0%

Most frequent character per block

ValueCountFrequency (%)
r104916
17.5%
u59400
9.9%
a59122
9.9%
e59122
9.9%
n46628
7.8%
o46072
7.7%
w46072
7.7%
g45794
7.6%
d45794
7.6%
t45794
7.6%
Other values (4)40262
 
6.7%

waterpoint_type
Categorical

HIGH CORRELATION

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size928.1 KiB
communal standpipe
28522 
hand pump
17488 
other
6380 
communal standpipe multiple
6103 
improved spring
 
784
Other values (2)
 
123

Length

Max length27
Median length18
Mean length14.82757576
Min length3

Characters and Unicode

Total characters880758
Distinct characters18
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowcommunal standpipe
2nd rowcommunal standpipe
3rd rowcommunal standpipe multiple
4th rowcommunal standpipe multiple
5th rowcommunal standpipe
ValueCountFrequency (%)
communal standpipe28522
48.0%
hand pump17488
29.4%
other6380
 
10.7%
communal standpipe multiple6103
 
10.3%
improved spring784
 
1.3%
cattle trough116
 
0.2%
dam7
 
< 0.1%
2021-04-14T09:52:25.328654image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-04-14T09:52:25.428943image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
standpipe34625
29.2%
communal34625
29.2%
hand17488
14.8%
pump17488
14.8%
other6380
 
5.4%
multiple6103
 
5.1%
improved784
 
0.7%
spring784
 
0.7%
trough116
 
0.1%
cattle116
 
0.1%

Most occurring characters

ValueCountFrequency (%)
p111897
12.7%
m93632
10.6%
n87522
9.9%
a86861
9.9%
59116
 
6.7%
u58332
 
6.6%
d52904
 
6.0%
e48008
 
5.5%
t47456
 
5.4%
l46947
 
5.3%
Other values (8)188083
21.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter821642
93.3%
Space Separator59116
 
6.7%

Most frequent character per category

ValueCountFrequency (%)
p111897
13.6%
m93632
11.4%
n87522
10.7%
a86861
10.6%
u58332
7.1%
d52904
 
6.4%
e48008
 
5.8%
t47456
 
5.8%
l46947
 
5.7%
i42296
 
5.1%
Other values (7)145787
17.7%
ValueCountFrequency (%)
59116
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin821642
93.3%
Common59116
 
6.7%

Most frequent character per script

ValueCountFrequency (%)
p111897
13.6%
m93632
11.4%
n87522
10.7%
a86861
10.6%
u58332
7.1%
d52904
 
6.4%
e48008
 
5.8%
t47456
 
5.8%
l46947
 
5.7%
i42296
 
5.1%
Other values (7)145787
17.7%
ValueCountFrequency (%)
59116
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII880758
100.0%

Most frequent character per block

ValueCountFrequency (%)
p111897
12.7%
m93632
10.6%
n87522
9.9%
a86861
9.9%
59116
 
6.7%
u58332
 
6.6%
d52904
 
6.0%
e48008
 
5.5%
t47456
 
5.4%
l46947
 
5.3%
Other values (8)188083
21.4%

waterpoint_type_group
Categorical

HIGH CORRELATION

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size928.1 KiB
communal standpipe
34625 
hand pump
17488 
other
6380 
improved spring
 
784
cattle trough
 
116

Length

Max length18
Median length18
Mean length13.90287879
Min length3

Characters and Unicode

Total characters825831
Distinct characters18
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowcommunal standpipe
2nd rowcommunal standpipe
3rd rowcommunal standpipe
4th rowcommunal standpipe
5th rowcommunal standpipe
ValueCountFrequency (%)
communal standpipe34625
58.3%
hand pump17488
29.4%
other6380
 
10.7%
improved spring784
 
1.3%
cattle trough116
 
0.2%
dam7
 
< 0.1%
2021-04-14T09:52:25.790802image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-04-14T09:52:25.906715image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
standpipe34625
30.8%
communal34625
30.8%
hand17488
15.6%
pump17488
15.6%
other6380
 
5.7%
improved784
 
0.7%
spring784
 
0.7%
trough116
 
0.1%
cattle116
 
0.1%
dam7
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
p105794
12.8%
m87529
10.6%
n87522
10.6%
a86861
10.5%
53013
 
6.4%
d52904
 
6.4%
u52229
 
6.3%
o41905
 
5.1%
e41905
 
5.1%
t41353
 
5.0%
Other values (8)174816
21.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter772818
93.6%
Space Separator53013
 
6.4%

Most frequent character per category

ValueCountFrequency (%)
p105794
13.7%
m87529
11.3%
n87522
11.3%
a86861
11.2%
d52904
 
6.8%
u52229
 
6.8%
o41905
 
5.4%
e41905
 
5.4%
t41353
 
5.4%
i36193
 
4.7%
Other values (7)138623
17.9%
ValueCountFrequency (%)
53013
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin772818
93.6%
Common53013
 
6.4%

Most frequent character per script

ValueCountFrequency (%)
p105794
13.7%
m87529
11.3%
n87522
11.3%
a86861
11.2%
d52904
 
6.8%
u52229
 
6.8%
o41905
 
5.4%
e41905
 
5.4%
t41353
 
5.4%
i36193
 
4.7%
Other values (7)138623
17.9%
ValueCountFrequency (%)
53013
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII825831
100.0%

Most frequent character per block

ValueCountFrequency (%)
p105794
12.8%
m87529
10.6%
n87522
10.6%
a86861
10.5%
53013
 
6.4%
d52904
 
6.4%
u52229
 
6.3%
o41905
 
5.1%
e41905
 
5.1%
t41353
 
5.0%
Other values (8)174816
21.2%

status_group
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size928.1 KiB
functional
32259 
non functional
22824 
functional needs repair
4317 

Length

Max length23
Median length10
Mean length12.48176768
Min length10

Characters and Unicode

Total characters741417
Distinct characters15
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowfunctional
2nd rowfunctional
3rd rowfunctional
4th rownon functional
5th rowfunctional
ValueCountFrequency (%)
functional32259
54.3%
non functional22824
38.4%
functional needs repair4317
 
7.3%
2021-04-14T09:52:26.208049image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-04-14T09:52:26.322362image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
functional59400
65.4%
non22824
 
25.1%
needs4317
 
4.8%
repair4317
 
4.8%

Most occurring characters

ValueCountFrequency (%)
n168765
22.8%
o82224
11.1%
i63717
 
8.6%
a63717
 
8.6%
f59400
 
8.0%
u59400
 
8.0%
c59400
 
8.0%
t59400
 
8.0%
l59400
 
8.0%
31458
 
4.2%
Other values (5)34536
 
4.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter709959
95.8%
Space Separator31458
 
4.2%

Most frequent character per category

ValueCountFrequency (%)
n168765
23.8%
o82224
11.6%
i63717
 
9.0%
a63717
 
9.0%
f59400
 
8.4%
u59400
 
8.4%
c59400
 
8.4%
t59400
 
8.4%
l59400
 
8.4%
e12951
 
1.8%
Other values (4)21585
 
3.0%
ValueCountFrequency (%)
31458
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin709959
95.8%
Common31458
 
4.2%

Most frequent character per script

ValueCountFrequency (%)
n168765
23.8%
o82224
11.6%
i63717
 
9.0%
a63717
 
9.0%
f59400
 
8.4%
u59400
 
8.4%
c59400
 
8.4%
t59400
 
8.4%
l59400
 
8.4%
e12951
 
1.8%
Other values (4)21585
 
3.0%
ValueCountFrequency (%)
31458
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII741417
100.0%

Most frequent character per block

ValueCountFrequency (%)
n168765
22.8%
o82224
11.1%
i63717
 
8.6%
a63717
 
8.6%
f59400
 
8.0%
u59400
 
8.0%
c59400
 
8.0%
t59400
 
8.0%
l59400
 
8.0%
31458
 
4.2%
Other values (5)34536
 
4.7%

Interactions

2021-04-14T09:51:45.727375image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:45.943554image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:46.144105image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:46.360927image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:46.552756image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:46.754464image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:46.986333image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:47.198525image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:47.390225image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:47.602063image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:47.804033image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:48.005548image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:48.207395image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:48.417203image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:48.623830image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:48.808782image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:49.009384image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:49.225623image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:49.433444image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:49.625264image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:49.826943image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:50.018572image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:50.200327image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:50.396388image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:50.596931image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:50.797501image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:50.974054image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:51.175875image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:51.509193image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:51.721250image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:51.923047image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:52.114733image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:52.325343image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:52.525930image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:52.726501image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:52.927090image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:53.149582image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:53.341269image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:53.553170image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:53.745086image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:53.930188image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:54.130721image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:54.315643image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:54.516250image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:54.732409image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:54.939629image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:55.135320image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:55.339515image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:55.531191image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:55.734938image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:55.936641image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:56.138308image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:56.340092image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:56.531738image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:56.743515image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:56.935402image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:57.127270image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:57.302609image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:57.503191image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:57.677080image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:57.887624image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:58.066071image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:58.423579image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:58.625604image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:58.831724image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:59.038033image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:59.240049image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:59.451253image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:59.653022image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:51:59.863017image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:52:00.046728image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:52:00.248571image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:52:00.455725image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:52:00.656262image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:52:00.856873image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:52:01.042417image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:52:01.244226image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:52:01.425833image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:52:01.627540image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:52:01.829591image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:52:02.031321image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:52:02.222973image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:52:02.435068image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:52:02.644234image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:52:02.844799image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:52:03.058160image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:52:03.254639image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:52:03.470851image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:52:03.671155image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-14T09:52:03.882984image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Correlations

2021-04-14T09:52:26.438284image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-04-14T09:52:26.688408image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-04-14T09:52:26.951006image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-04-14T09:52:27.263542image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-04-14T09:52:27.769151image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-04-14T09:52:04.686956image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
A simple visualization of nullity by column.
2021-04-14T09:52:06.939996image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-04-14T09:52:07.982458image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-04-14T09:52:08.418905image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

idamount_tshdate_recordedfundergps_heightinstallerlongitudelatitudewpt_namenum_privatebasinsubvillageregionregion_codedistrict_codelgawardpopulationpublic_meetingrecorded_byscheme_managementscheme_namepermitconstruction_yearextraction_typeextraction_type_groupextraction_type_classmanagementmanagement_grouppaymentpayment_typewater_qualityquality_groupquantityquantity_groupsourcesource_typesource_classwaterpoint_typewaterpoint_type_groupstatus_group
0695726000.02011-03-14Roman1390Roman34.938093-9.856322none0Lake NyasaMnyusi BIringa115LudewaMundindi109TrueGeoData Consultants LtdVWCRomanFalse1999gravitygravitygravityvwcuser-grouppay annuallyannuallysoftgoodenoughenoughspringspringgroundwatercommunal standpipecommunal standpipefunctional
187760.02013-03-06Grumeti1399GRUMETI34.698766-2.147466Zahanati0Lake VictoriaNyamaraMara202SerengetiNatta280NaNGeoData Consultants LtdOtherNaNTrue2010gravitygravitygravitywuguser-groupnever paynever paysoftgoodinsufficientinsufficientrainwater harvestingrainwater harvestingsurfacecommunal standpipecommunal standpipefunctional
23431025.02013-02-25Lottery Club686World vision37.460664-3.821329Kwa Mahundi0PanganiMajengoManyara214SimanjiroNgorika250TrueGeoData Consultants LtdVWCNyumba ya mungu pipe schemeTrue2009gravitygravitygravityvwcuser-grouppay per bucketper bucketsoftgoodenoughenoughdamdamsurfacecommunal standpipe multiplecommunal standpipefunctional
3677430.02013-01-28Unicef263UNICEF38.486161-11.155298Zahanati Ya Nanyumbu0Ruvuma / Southern CoastMahakamaniMtwara9063NanyumbuNanyumbu58TrueGeoData Consultants LtdVWCNaNTrue1986submersiblesubmersiblesubmersiblevwcuser-groupnever paynever paysoftgooddrydrymachine dbhboreholegroundwatercommunal standpipe multiplecommunal standpipenon functional
4197280.02011-07-13Action In A0Artisan31.130847-1.825359Shuleni0Lake VictoriaKyanyamisaKagera181KaragweNyakasimbi0TrueGeoData Consultants LtdNaNNaNTrue0gravitygravitygravityotherothernever paynever paysoftgoodseasonalseasonalrainwater harvestingrainwater harvestingsurfacecommunal standpipecommunal standpipefunctional
5994420.02011-03-13Mkinga Distric Coun0DWE39.172796-4.765587Tajiri0PanganiMoa/MweremeTanga48MkingaMoa1TrueGeoData Consultants LtdVWCZingibaliTrue2009submersiblesubmersiblesubmersiblevwcuser-grouppay per bucketper bucketsaltysaltyenoughenoughotherotherunknowncommunal standpipe multiplecommunal standpipefunctional
6198160.02012-10-01Dwsp0DWSP33.362410-3.766365Kwa Ngomho0InternalIshinabulandiShinyanga173Shinyanga RuralSamuye0TrueGeoData Consultants LtdVWCNaNTrue0swn 80swn 80handpumpvwcuser-groupnever paynever paysoftgoodenoughenoughmachine dbhboreholegroundwaterhand pumphand pumpnon functional
7545510.02012-10-09Rwssp0DWE32.620617-4.226198Tushirikiane0Lake TanganyikaNyawishi CenterShinyanga173KahamaChambo0TrueGeoData Consultants LtdNaNNaNTrue0nira/taniranira/tanirahandpumpwuguser-groupunknownunknownmilkymilkyenoughenoughshallow wellshallow wellgroundwaterhand pumphand pumpnon functional
8539340.02012-11-03Wateraid0Water Aid32.711100-5.146712Kwa Ramadhan Musa0Lake TanganyikaImalaudukiTabora146Tabora UrbanItetemia0TrueGeoData Consultants LtdVWCNaNTrue0india mark iiindia mark iihandpumpvwcuser-groupnever paynever paysaltysaltyseasonalseasonalmachine dbhboreholegroundwaterhand pumphand pumpnon functional
9461440.02011-08-03Isingiro Ho0Artisan30.626991-1.257051Kwapeto0Lake VictoriaMkonomreKagera181KaragweKaisho0TrueGeoData Consultants LtdNaNNaNTrue0nira/taniranira/tanirahandpumpvwcuser-groupnever paynever paysoftgoodenoughenoughshallow wellshallow wellgroundwaterhand pumphand pumpfunctional

Last rows

idamount_tshdate_recordedfundergps_heightinstallerlongitudelatitudewpt_namenum_privatebasinsubvillageregionregion_codedistrict_codelgawardpopulationpublic_meetingrecorded_byscheme_managementscheme_namepermitconstruction_yearextraction_typeextraction_type_groupextraction_type_classmanagementmanagement_grouppaymentpayment_typewater_qualityquality_groupquantityquantity_groupsourcesource_typesource_classwaterpoint_typewaterpoint_type_groupstatus_group
59390136770.02011-08-04Rudep1715DWE31.370848-8.258160Kwa Mzee Atanas0Lake TanganyikaKitontoRukwa152Sumbawanga RuralMkowe150TrueGeoData Consultants LtdVWCNaNFalse1991swn 80swn 80handpumpvwcuser-groupnever paynever paysoftgoodinsufficientinsufficientmachine dbhboreholegroundwaterhand pumphand pumpfunctional
59391448850.02013-08-03Government Of Tanzania540Government38.044070-4.272218Kwa0PanganiMaore KatiKilimanjaro33SameMaore210TrueGeoData Consultants LtdWater authorityHingililiTrue1967gravitygravitygravityvwcuser-groupnever paynever paysoftgoodenoughenoughriverriver/lakesurfacecommunal standpipecommunal standpipenon functional
59392406070.02011-04-15Government Of Tanzania0Government33.009440-8.520888Benard Charles0Lake RukwaMbuyuni AMbeya121ChunyaMbuyuni0TrueGeoData Consultants LtdVWCNaNTrue0gravitygravitygravityvwcuser-groupnever paynever paysoftgoodenoughenoughspringspringgroundwatercommunal standpipecommunal standpipenon functional
59393483480.02012-10-27Private0Private33.866852-4.287410Kwa Peter0InternalMasangaTabora142IgungaIgunga0FalseGeoData Consultants LtdWater authorityNaNFalse0gravitygravitygravityprivate operatorcommercialpay per bucketper bucketsoftgoodinsufficientinsufficientdamdamsurfaceotherotherfunctional
5939411164500.02011-03-09World Bank351ML appro37.634053-6.124830Chimeredya0Wami / RuvuKomstariMorogoro56MvomeroDiongoya89TrueGeoData Consultants LtdVWCNaNTrue2007submersiblesubmersiblesubmersiblevwcuser-grouppay monthlymonthlysoftgoodenoughenoughmachine dbhboreholegroundwatercommunal standpipecommunal standpipenon functional
593956073910.02013-05-03Germany Republi1210CES37.169807-3.253847Area Three Namba 270PanganiKiduruniKilimanjaro35HaiMasama Magharibi125TrueGeoData Consultants LtdWater BoardLosaa Kia water supplyTrue1999gravitygravitygravitywater boarduser-grouppay per bucketper bucketsoftgoodenoughenoughspringspringgroundwatercommunal standpipecommunal standpipefunctional
59396272634700.02011-05-07Cefa-njombe1212Cefa35.249991-9.070629Kwa Yahona Kuvala0RufijiIgumbiloIringa114NjombeIkondo56TrueGeoData Consultants LtdVWCIkondo electrical water schTrue1996gravitygravitygravityvwcuser-grouppay annuallyannuallysoftgoodenoughenoughriverriver/lakesurfacecommunal standpipecommunal standpipefunctional
59397370570.02011-04-11NaN0NaN34.017087-8.750434Mashine0RufijiMadunguluMbeya127MbaraliChimala0TrueGeoData Consultants LtdVWCNaNFalse0swn 80swn 80handpumpvwcuser-grouppay monthlymonthlyfluoridefluorideenoughenoughmachine dbhboreholegroundwaterhand pumphand pumpfunctional
59398312820.02011-03-08Malec0Musa35.861315-6.378573Mshoro0RufijiMwinyiDodoma14ChamwinoMvumi Makulu0TrueGeoData Consultants LtdVWCNaNTrue0nira/taniranira/tanirahandpumpvwcuser-groupnever paynever paysoftgoodinsufficientinsufficientshallow wellshallow wellgroundwaterhand pumphand pumpfunctional
59399263480.02011-03-23World Bank191World38.104048-6.747464Kwa Mzee Lugawa0Wami / RuvuKikatanyembaMorogoro52Morogoro RuralNgerengere150TrueGeoData Consultants LtdVWCNaNTrue2002nira/taniranira/tanirahandpumpvwcuser-grouppay when scheme failson failuresaltysaltyenoughenoughshallow wellshallow wellgroundwaterhand pumphand pumpfunctional